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A  Rumination  On  Wars  Ahead 
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Remarks  to  the  Military  Testing  Association 
Washington,  D.C. 

26  October  1981 

Lt.  Gen.  P.  F.  Gorman,  U.S.  Army 


I  will  be  talking  forecasting  with  you,  but  let  me  make  it  clear  that 
mine  is  not  a  Schweitzerian  prediction  of  inevitable  drift  towards  war,  but 
rather  the  sort  of  concern  that  all  of  us  in  the  Pentagon  have  to  direct 
toward  the  future  in  order  to  do  our  job.  Bob  Schweitzer  aside,  I  would 
have  you  note  that  Nick  “the  Greek"  Snyder  quotes  odds  of  ten  to  one  that 
there  will  be  a  war  involving  our  forces  within  the  coming  decade.  As  the 
present  Secretary  of  Defense  has  stated,  it  is  the  business  of  his  Department 
to  prepare  for  future  wars.  Wars  and  rumors  of  war  are  problems  with  which 
we  must  seriously  contend.  Let  me  begin  by  reading  you  a  quotation  from  the 
Defense  Guidance,  which  as  some  of  you  may  know,  is  a  document  wherein  the 
Secretary  of  Defense  instructs  the  member?  of  his  own  staff  and  the  Joint 
Chiefs  of  Staff  on  how  he  wants  them  to  prepare  for  the  future.  The  time 
frame  for  planning  in  the  current  draft  Defense  Guidance  is  the  fiscal 
years  1984  through  1988.  Thereby  you  can  see  we  are  already  conceptually 
far  into  the  future.  I  will  quote  from  a  section  of  the  guidance  which 
deals  with  our  using  to  advantage  those  elements  of  the  American  system 
wherein  we  might  be  able  to  ;njoy  some  edge  over  the  competition: 


"An  inherent  advantage  of  our  system  is  the  ability  to  combine  man¬ 
agerial  skills  and  technology  to  solve  difficult  problems.  Getting 
serious  about  competing  with  the  Soviets  will  require  that  the  U.S. 
use  its  advantages  to  develop  an  overall  strategy  with  several  areas 
of  primary  emphasis:  (1)  developing  first-rate  weapons  and  elimi¬ 
nating  some  of  the  reliabilty  problems  we  have  recently  experienced; 
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12 1  expanding  the  competition  into  other  areas  where  the  Soviets 
find  it  difficult  to  compete;  (3)  designing  innovative  forces  and 
operational  tactics  a.id  procedures  that  act  against  the  Soviet  control 
system  and  frustrate,  disrupt,  and  defeat  Soviet  attempts  to  achieve 
poli ti cal /mi  1 itary  objectives;  and  (4)  developing  cost  imposing 
strategies  whose  basic  aim  is  to  obsolesce  past  Soviet  investments." 

How,  that  guidance  is  a  taP  order.  The  purpose  of  my  discussion  is  to 

present  considerations  which  someone  in  your  line  of  work,  as  well  as  mine, 

might  bring  to  bear  upon  an  attempt  to  comply,  either  from  '84  to  *88,  or 

in  the  years  beyond.  Let  me  start  by  laying  out  for  you  some  of  the  trends 

in  warfare  of  which  I  am  aware,  and  which  you  may  want  to  take  into 

account. 

Slide  2 

First  of  all,  the  long  sweeping  line  ascending  from  the  year  1860,  the 
time  of  our  Civil  War,  to  the  1990's  in  the  upper  right-hand  side  of  the  chart 
traces  the  amount  of  area  that  a  battalion— a  group  of  600-800  men— was  ex¬ 
pected  to  control  given  the  firepower  and  mobility  means  available  to  them  at 
any  given  point  in  time.  The  fundamental  reason  for  having  military  forces  at 
all  is  to  control  land  and  people,  and  the  trends  are  all  in  the  direction  of 
doing  more  militarily  with  fewer  soldiers.  As  you  can  see,  I  have  extrapolated, 
like  most  futurists,  control  trends  beyond  the  present.  I  would  also  call  to 
your  attention  that,  like  most  futurists,  I  have  used  a  logarithmic  scale  on 
the  ordinate.  The  bars  graph  firepower,  for  which  the  measure  of  merit  is  the 
pounds  of  projectiles  that  a  division,  a  large  assemblage  of  battalions,  could 
hurl  at  an  enemy  in  the  course  of  an  hour,  all  guns  firing  maximum  rate,  for 
each  soldier  engaged.  You  will  note  that  the  bar  graph  heights  do  not  keep 
pace  with  the  extention  of  the  area  of  control,  and  that  lack  of  direct  correla¬ 
tion  is  a  function  of  both  the  superior  mobility  means  that  we  have  made  avail¬ 
able  to  our  land  warfare  organizations--! arge  quantities  of  tracked  vehicles. 
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and  helicopters,  plus  advanced  communications--as  well  as  the  efficiency  of 
modern  ordnance--one  need  not  project  as  much  ordnance  in  terms  of  poundage  for 
greater  lethality,  given  more  efficient  conventional  munitions,  and  the 
so-called  "smart"  weaponry  in  the  force.  Finally,  on  the  slide,  slashing 
down  diagonally  over  the  trends  just  described,  is  a  measure  that  I  have 
called  "division  dispersion,"  Here  I  have  taken  an  amount  of  the  forward  edge 
of  the  battle  area  ( FEBA)  that  the  division  is  responsible  for  in  normal 
disposition,  and  computed  a  measure  of  men  per  kilometer  of  FEBA.  And  as  you 
can  see,  those  trends  plunge  linearly  downward,  and  my  projections  suggest  to 
you  that  they  will  keep  going  down  through  the  1990' s.  What  that  line  tells 
you  is  that  we  have  multiplied  the  ability  of  each  soldier  on  the  battlefield, 
or  that  the  density  of  men  on  the  battlefield  will  be  going  down,  and  going 
down  dramatically,  continuing  the  trends  that  we  have  observed  as  technology 
has  been  applied  to  battle  in  recent  years.  Each  soldier  in  battle  will  count 
for  more  toward  accomplishing  the  basic  mission  of  control  of  the  earth's 
surface. 

Slide  3 

I  believe  that  it  is  possible,  looking  into  the  future,  to  conclude  that, 
almost  certainly,  a  technolgically  advanced  combatant  in  future  warfare  will 
be  able  to  see  all  elements  of  an  opposing  forces  in  real  time,  and  will  have 
at  his  disposal  firepower  means  for  reaching  out  to  strike  throughout  the 
depth  of  the  opponent's  war-waging  apparatus  from  his  theater  forces  all  the 
way  back  to  his  strategic  reserves.  Some  naval  officers  have  found  it  reason¬ 
able  to  say,  vis-a-vis  naval  warfare,  that  it  will  be  difficult  if  not  impossible 
to  steam  around  the  seas  with  forces  centered  on  a  large-decked  carrier,  with 
protective  rings  of  specialized  air  and  submarine  defense  ships  around  that 
carrier.  Some  air  officers  have  found  it  possible  to  say  that  we  will  have  to 
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find  alternatives  to  the  operation  of  air  forces  from  large  fixed  airfields, 
whereon  aircraft  are  processed  for  high  sortie  generation  rates  on  something 
like  an  assembly  line  basis.  I  can  assert  for  land  warfare  that  the  day  will 
soon  be  gone  when  massed  formations  of  armored  vehicles  will  be  able  to  swarm 
over  the  surface  of  the  earth,  trailing  behind  elaborate  logistic  tails. 
Instead  we  are  going  to  have  to  move  toward  something  like  "distributed 
force,"  meaning  that  in  order  to  provide  protection  we  will  have  to  disperse 
more  broadly  and  thoroughly  than  ever  before,  and  thus  confuse  the  enemy  as  to 
which  elements  of  the  target  array  before  them  are  particularly  significant  as 
threat.  Our  tactical  dispositions  will  have  to  confront  our  foe  with  a  large 
complex  of  target  elements,  each  of  which  is  potentially  able  to  deliver 
punishing  firepower,  and  each  element  of  which  could  be  capable  of  developing 
the  intelligence  requisite  to  the  accurate  delivery  of  that  firepower. 

Mow,  there  are  enermous  impediments  both  technological  and  cultural  to 
achieving  such  a  capability.  Put  I  am  convinced  that  that  nation  who  is  first 
able  to  achieve  the  desiderata  that  I  have  sketched  will  exert  an  enormous 
superiority  over  potential  adversaries,  and  I  suggest  that  the  excerpt  of  the 
Defense  Guidance  that  I  just  read  you  is  Quite  right:  it  would  be  important 
for  the  United  States,  and  any  other  nations  of  the  free  world  who  wish  to 
assist  in  the  competition  with  the  Soviet  Union,  to  bend  their  efforts  to 
field  first-rate  weapons,  and  invent  new  tactics  and  techniques  for  using 
them. 
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Here  is  a  description  of  future  land  warfare  which  I  have  drawn  from  a 


presentation  made  to  the  Secretary  of  Defense  just  this  past  summer  by  the 
Defense  Science  Board,  which  as  many  of  you  may  know  is  a  group  of  prestigious 
scientists  who  provide  advice  to  the  Secretary  on  problems  of  particular 


significance.  This  summer  he  ashed  several  DSB  Panels  to  look  into  the  future 
of  warfare,  and  to  advise  him  what  technologies  might  be  relevant  to  the 
Department  of  Defense  as  it  sought  to  implement  the  guidance  that  I  read  at 
the  outset  of  my  remarks.  Here  you  can  see  the  notion  of  “distributee)  force" 
in  an  athletic  analog,  a  way  of  describing  warfare  which  is  always  more 
comfortable  for  Americans  than  most. 

Slide  5 

But  such  amorphous  warfare  win  generate  a  series  of  new  needs  or  require¬ 
ments,  of  which  the  Defense  Science  Board  cited  these.  In  other  words,  these 
needs  are  problems  that  technology  has  to  solve  in  order  for  us  to  have  the 
capability  to  compete  in  land  warfare  as  just  described.  The  red  signals  on 
the  right  indicate  needs  which  1  submit  are  as  much  sociological  as  they  are 
technological,  as  much  a  demand  for  cultural  evolution  as  they  are  matters 
for  advanced  science  and  materials.  And  it  is  to  the  needs  for  cultural  leaps 
that  I  will  eventually  direct  our  attention  here  today. 

But  before  I  do  so,  let  me  quickly  walk  you  through  some  of  the  techno¬ 
logical  responses  to  these  needs  that  the  DSB  Panel  cited  for  the  Secretrary 
of  Defense. 

Slide  6 

In  the  first  place,  they  pointed  out  the  Urited  States  faces  a  very  dif¬ 
ficult  choice,  as  indicated  here:  Shall  we,  as  many  advocate,  settle  for  less 
complicated,  less  technol igical ly  advanced  weapon  systems,  systems  of  only 
sufficient  capability,  and,  since  their  price  will  be  lower,  thereby  assure  that 
we  can  purchase  more  weapons?  Or  shall  we  rather  take  the  course  advocated  by 
the  Defense  Guidance  and  reach  for  excellence,  knowir.y  full  well  that,  if  we 
do,  we  thereby  expose  ourselves  to  high  expense,  and  concomitantly  have 
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to  settle  for  relatively  few  of  those  weapon  systems?  All  pertinent  trends  are 
depressing: 


Slide  7 

Here  is  a  slide  used  by  Norman  Augustine,  Chairman  of  the  Defense  Science 
Board.  He  calls  it  "Calvin  Coolidge's  Revenge"  because  President  Coolidge, 
when  presented  military  budget  for  1928,  is  reported  to  have  said,  anent  an 
item  of  some  $25,000  for  the  purchase  of  a  squadron  of  airplanes,  “Why 
can't  we  buy  just  one  airplane  and  let  the  aviators  take  turns?"  Mr.  Augustine' 
chart  makes  it  evident  that  by  the  year  2054  the  entire  Defense  Budget  of  the 
United  States  will  be  able  to  purchase  just  one  airplane,  so  we  will  have  to 
let  the  Navy  use  it  3-1/2  days  each  week  and  the  Air  Force  the  remainder. 

And,  of  course,  by  the  year  2100,  it  predicts  that  the  entire  gross  national 
product  of  the  United  States  will  buy  just  one  airplane,  and  we  will  have 
unification  of  the  services  at  last. 

Slide  8 

Not  only  are  cost  trends  up,  but  as  the  systems  become  more  expensive,  the 
trends  are  to  buy  fewer  of  them:  whether  it's  carriers  of  the  NIMITZ  class, 
down  on  the  bottom  right,  the  AWACS,  or  various  advanced  air  defense  systems, 
the  more  expensive  the  system  is,  the  fewer  that  you  will  see  in  inventory. 
Hence,  we  are  indeed  very  much  in  a  numbers  quandry. 

Slide  9 

Moreover,  as  the  comp! ex1' ty  and  costs  of  systems  increase,  reliability 
seems  to  decrease.  Here  is  a  plot  Mr.  Augustine  has  put  together  comparing 
costs  of  various  items  of  recent  aviation  electronic  equipment  to  hours 
between  failure.  This  chart,  he  says,  illustrates  that  if  you  are  willing  to 
pay  enough  for  a  given  avionic  apparatus,  you  can  guarantee  that  it  won't  work 
at  all. 
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However,  this  summer  the  Defense  Science  Board,  after  examining  just  such 
trends  and  propositions,  came  to  this  rather  opposite  conclusion.  Why? 

Slide  11 

Well,  among  other  things  they  looked  at  pairings  of  advanced  systems  with 
older  counterparts.  For  example,  here  the  F-14  and  the  F-40,  a  Naval  fighter 
new  and  old;  Patriot  and  Improved  Hawk,  SAM  air  defense  systems  new  and  old; 
the  F-15  and  F-4E,  Air  Force  fighter  aircraft,  new  and  old;  and  the  M-l  and 
M-60  A3,  Army  tanks,  new  and  old.  The  values  on  the  ordinate  express  effec¬ 
tiveness  in  terms  of  ratio  of  kill  potential,  new  to  old  (you'll  just  have  to 
take  the  DSB  Panel's  word  that  these  are  reduced  some  kind  of  common  base  for 
comparison) .  And  on  the  abscissa  is  plotted  the  ratio  of  acquisition  costs, 
new  to  old.  The  parity  line  represents  the  break  even  point,  system  for 
system.  As  you  can  see,  the  very  much  increased  effectiveness  of  the  new 
systems  drives  all  of  these  pairings  well  up  into  a  win  area  of  much  better 
than  parity:  3,  4,  6  to  1. 

Slide  12 

The  same  thing  is  true  In  terms  of  support  costs,  which  get  at  the  issue 
of  keeping  systems  available  or  ready  for  use  in  the  force.  Support  costs 
include  the  costs  of  parts,  training  the  operator  and  the  maintainers  to  keep 
the  system  operating,  and  the  costs  of  requisite  basing.  Again,  the  picture 
here  is  favorable  for  the  new  systems.  So  it  is  possible,  according  to  the  DSB, 
to  build  into  new  systems  both  high  availability  rates  and  very  high  effective¬ 
ness,  and  thereby  to  compensate  for  reduced  numbers  of  systems.  In  fact,  as 
you  can  see,  with  effectiveness  ratios  of  5  to  1  or  so  over  systems  they  are 
replacing,  one  can  operate  with  significantly  fewer  elements  in  the  force. 


23 


Indeed,  one  DSB  panel  made  for  the  Secretary  of  Defense  this  kind  of  a  case 
for  high  performance  or  high  technology  systems.  We  might  buy  twice  as  many 
"half- performance  systems,"  but  the  panel  noted  that  the  Soviets  are  building  a 
very  large,  very  high  performance  force,  against  which  a  smaller  lower  perfor¬ 
mance  force  would  do  very  badly.  Half-performance  systems  could  not  defeat 
Soviet  systems  built  on  advanced  technology,  such  as  we  see  in  their  tanks  or 
in  their  titanium  hull  submarine.  Moreover,  high  performance  systems  give  us 
for  the  first  time  the  ability  to  operate  at  night,  in  adverse  weather,  and 
under  the  electronic  warfare  conditions  which  will  be  typical  of  the  future. 
Moreover,  we  can  get  more  "sorties"  out  of  such  equipment  than  has  been  pos¬ 
sible  in  the  past.  (Incidentally,  for  those  of  you  who  have  been  following 
the  dialogue  in  Congress  over  whether  some  particular  fighter  aircraft  is 
better  than  the  one  it  is  replacing,  should  note  that  very  frequently  critics 
of  our  latest  airplanes  will  contend  that  the  sortie  rate  of  the  new  is  below 
that  of  a  plane  we  had  in  World  War  II  or  Korea.  You  have  to  remember  that 
sortie  rates  for  the  present  peacetime  force  are  programmed  by  flying  hours 
and  parts,  and  we  don't  fly  these  airplanes  more  often  because  we  didn't  have 
funds  to  do  so.  But  our  new  fighters,  in  tests  under  field  maneuver  conditions 
in  Europe  and  elsewhere,  have  demonstrated  a  capability  to  produce  sortie  rates 
two  and  three  times  what  we  have  had  in  wars  past.)  Perhaps  more  importantly, 
to  accept  lower  performance  systems  would  be  simply  to  accept  higher,  avoidable 
U.S.  casualties.  And  if  we  went  the  low-performance  route,  we  would  need  to 
increase  our  intake  of  military  manpower  overall  not  only  to  compensate  for 
losses,  but  to  operate  the  increased  numbers  of  systems  in  the  force.  True, 
these  could  conceivably  be  less  demanding  systems  in  terms  of  skill  requirements. 


But  higher  numbers  of  systems  would  increase  our  already  high  support  costs, 
and--the  point  needs  to  be  made  again  and  again--human  costs,  manpower  costs, 
will  dominate  the  life  cycle  cost  of  virtually  every  system  that  the  Department 
of  Defense  presently  has  under  procurement  or  under  research  and  development. 

And  then  finally,  of  course,  maintenance  support  for  larger  numbers  of  systems 
in  the  force  would  increase  the  strain  on  a  force  which  is  already  stressed  In 
providing  for  its  large  logistic  tail,  its  training  base,  its  management,  housing 
for  dependents,  personnel  services,  and  headquarters  overhead.  The  last  line  on 
the  slide,  basing,  makes  the  additional  point  that  whether  you  are  talking  about 
aircraft  carrier  decks,  or  airfields,  or  tank  parks,  we  today  have  a  constrained 
basing  system  for  the  American  armed  forces;  putting  a  lot  more  systems  into  the 
field  could  create  problems  of  basing  for  us. 

Slide  14 

Well,  what  high  technology  should  we  reach  for  to  acquire  high  performance 
systems,  pursuant  to  the  Defense  Guidance?  Here  is  a  list  of  technologies 
which  the  Defense  Science  Board  believes  could  make  an  order  of  magnitude  dif¬ 
ference  in  our  capability  to  meet  the  exegencies  of  future  warfare.  What  you  are 
looking  at  are  results  of  a  Delphi  technique,  in  which  the  panelists  assigned 
measures  the  merit  to  various  technologies  in  order  to  assess  relative  opportunity 
and  relative  risks,  from  which  they  derived  the  numerical  rating  on  the  right,  a 
function  of  both  opportunity  and  risk. 

I  am  going  to  talk  about  a  number  of  these  particular  technologies  in  a 
moment,  but  note  that  overall  the  Defense  Science  Board  commended  some  17 
technologies  to  the  Secretary.  I  have  deliberately  avoided  having  to  discuss 
some  of  the  more  arcane--such  as  "high  density  monolithic  focal  plan  arrays"— in 
order  to  focus  on  some  of  the  technologies  which  I  find,  interestingly  enough,  are 
both  at  the  top  of  the  list  and  prominent  for  their  sociological  or  cultural 
dimensions. 
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For  example,  this  one.  On  the  right  you  see  the  investment  posture  in 
the  Defense  Research  and  Development  budget  at  the  moment,  and  elsewhere  on 
the  slide,  an  indication  of  what  Defense  might  do  by  way  of  applications:  what 
kinds  of  opportunities  the  technology  could  provide.  I  would  have  you  note 
that  the  Defense  Science  Board  panel  identified  computer  software  as  the  highest 
risk  undertaking  among  all  Defense  development  undertakings. 

Slide  16 

This  chart,  rather  startling  to  some,  indicates  the  problem  that  we  face 
unless  we  are  able  to  find  ways  of  producing  computer  software  more  efficiently 
than  we  have  in  the  past.  Without  some  technological  intervention,  we  simply  are 
not  going  to  be  able  to  exploit  the  promise  of  smaller,  smarter  processors.  The 
United  States  is  facing  very  severe  shortages  of  system  analysts,  computer  scienti 
and  programmers.  The  research  into  advanced  software  methods  to  which  I  just 
alluded,  does  not  hold  out  promise  for  relief  for  many  years,  but  it  is  hopeful, 
and  we  should  pursue  it. 

Slide  17 

And  then  you  have  a  possibility  of  improving  the  inherent  capability  of 
materiel  with  respect  to  its  availability  or  reliability  in  the  field.  In 
commending  this  particular  technology  to  the  Secretary  of  Defense,  the  Defense 
Science  Board  stated  as  follows: 

"Reliability  standards  must  be  raised  significantly — the  technology 
to  support  such  increases  is  avail able--the  adherence  to  these 
standards  the  first  time  around  is  the  most  economical  approach  in 
the  long  run.  Front-end  costs  will  be  higher. .. .Times  and  funds 
must  be  programmed  in  the  development  cycle  to  accommodate  necessary 
redesign  iterations  after  test  and  before  Initial  Operating  Capabil¬ 
ity  for  critical  reliability,  maintainability  and  producibility 
problems  (as  well  as  performance  problems)." 
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And  then,  lastly,  for  the  moment  at  least,  this  particular  technology. 

A  number  of  recent  publications  have  made  it  evident  that  by  the  end  of  the 
decade  of  the  1980' s  there  will  be  available  machines,  computers  or  processors, 
which  will  be  comparable  in  size  and  in  weight  to  the  human  brain,  and  will 
have  comparable  requirements  for  input  and  energy,  and  which  will  have  quite 
comparable  capabilities  for  output.  Now  what  that  means  for  the  weapon  system 
developer  is  obvious. 

Slide  19 

Here  is  a  technology  which  was  not  cited  by  the  DSB,  but  one  I  picked  to 
highlight  further  the  human  dimension  of  the  problem.  This  is  a  mechanism  for 
propelling  a  projectile  using  electro-motive  vice  chemical  energy,  exploiting 
the  so  called  Lorentz  force.  By  the  end  of  this  calendar  year,  1981,  in  the 
Westinghouse  R4D  Center  at  Pittsburgh,  Pennsylvania,  the  Defense  Advanced  Research 
Projects  Agency  and  the  Army's  Development  and  Readiness  Command  hopes  to  have 
operational  a  laboratory  device  which  will  be  able  to  propel  a  300-gram  mass  to  a 
speed  of  3  kilometers  per  second,  yielding  a  muzzle  energy  on  the  order  of  1.35 
megajoules.  In  effect,  this  laboratory  device  will  shoot  a  bullet  about  3  times 
faster  than  present  rifles  or  tank  guns,  and  opens  a  whole  new  realm  of  physical 
possibilities  with  respect  to  antiaircraft  guns,  tank  guns,  and  that  sort  of 
thing--Tom  Swift's  Electric  Gun  at  last.  Such  weapons,  if  they  ever  become 
practical,  will  also  propel  us  into  a  whole  new  realm  of  difficulties  with  that 
most  intransigent,  stultified,  subculture  within  military  sociology,  the 
artillery. 
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In  1977  when  I  took  command  of  the  8th  Division  in  Germany,  I  found  this 
state  of  affairs  in  my  division  artillery.  \  found  that  my  artillerymen  adhered 


to  their  manpower-intensive  ways  of  doing  business  as  assiduously  as  their 
forefathers  had  clung  to  horsepower.  Application  of  high  school  physics--levers 
and  pulleys,  inclined  planes--could  have  bettered  the  situation,  let  alone 
buying  materiel  handling  equipment,  or  looking  to  robotics  to  solve  the  problem. 
Let  me  read  to  you  from  a  recent  Army  War  College  pamphlet  on  the  Army  in  the 
year  2000. 

“In  considering  technology  the  Army  must  look  introspect'' vely  at  its 
ability  to  use  advanced  technology  and  its  past  performance  in  this 
area.  For  over  20  years  there  has  been  the  technological  capability 
to  have  a  Howitzer  that  could  be  electronically  laid  (directed), 
fuzes  automatically  set,  rounds  automatically  rammed,  muzzle  velocity 
(for  future  corrections)  electronically  measured  and  firing  data 
electronically  computed  from  an  electronic  sensing.  The  actual 
condition  is  that  there  are  many  artillery  commanders  taking  great 
pride  in  the  fact  that  they  never  fire  their  Howitzers  using  only 
the  FADAC  (a  very  old  computer  which  is  dependent  on  mobile  genera¬ 
tors  usually  in  short  supply).  These  commanders  insist  on  checking 
the  FADAC  by  manual  means  or  they  check  the  manual  using  the  FADAC. 

One  could  imagine  the  confusion  resulting  from  the  introduction  of 
the  modern  artillery  systems  which  we  should  have." 


I  can  attest  to  that.  I  sent  back  from  Germany  a  young  officer  to  the  Army 
Ma*erie1  Systems  Analysis  Agency  where,  working  with  ballistic  experts,  he 
developed  a  chip  for  a  Texas  Instruments  programmable  hand-held  calculator 
incorporating  the  firing  tables  for  our  medium  howitzers.  The  device  proved 
both  reliable  and  quick.  But  I  can  vividly  recall  walking  into  a  fire  direction 
center  of  a  battalion  firing  at  Grafenwohr  to  find  three  tiers  of  firing  data 
calculators  in  operation.  On  tier  one  of  a  football-bleacher  like  set-up  there 
were  three  plotters  producing  data  for  the  guns  using  the  graphic  techniques 
that  had  been  used  in  World  War  I  and  World  War  II.  On  tier  two  were  not  one 
but  two  FADAC  operators  checking  the  graphic  data.  And  in  the  back  at  the  top 
of  the  pyramid  was  a  lad  with  the  Texas  Instrument  device.  Invariably  producing 
his  data  faster  and  as  accurately  as  anybody  else,  but  whose  data  would  not  be 
independently  accepted  by  any  self-respecting  fire  control  officer,  you  may  rest 


assured . 
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What  I  am  suggesting  is  that  a  technology  like  electric  guns,  which  could 
eliminate  chemical  energy  as  a  way  of  propelling  projectiles,  could  also 
eliminate  an  enormous  amount  of  that  manpower  intensive  logistic  tail  to  wnich 
I  alluded  at  the  outset,  and  may  make  it  possible  for  us  to  have  a  genuinely 
distributed  force.  In  future  wars  we  will  have  to  dispense  with  specialization. 

I  do  not  believe  that  it  will  be  possible  to  maintain  through  the  1990' s  an  arm 
dedicated,  as  we  now  dedicate  the  artillery,  exclusively  to  the  delivery  of 
indirect  firepower.  1  think  all  elements  of  the  force  are  going  to  have  to 
be  capable  of  contributing  to  both  direct  and  indirect  firepower,  to  anti¬ 
aircraft  and. anti  tank  defenses,  and  to  reconnaissance.  Hence,  our  present 
specialized,  manpower-intensive  artillery  has  got  to  yield  place  to  multi-purpose 
weapons  systems. 
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This  is  a  cartoon  suggesting  such  a  future  weapon,  by  William  Coulter  of 
the  Washington  Post.  It  is  not,  I  assure  you,  genuinely  classified.  It 
depicts  a  device  which  I  admire  if  only  because  it  operates  by  ingesting  its 
own  technical  documentation.  I  believe,  as  the  data  I  am  about  to  show  will 
illustrate,  that  such  a  feature  in  a  weapon  may  be  the  only  hope  for  the  Army 
of  the  future.  But  I  want  to  use  this  slide  to  make  a  mere  serious  point  about 
the  dilemma  that  we  face:  here  the  artist  caricatures  one  of  those  omnipotent 
machines  we  might  have  to  develop.  I  would  say  that  the  principal  obstacle  to 
our  fielding  to  such  a  machine  is  the  fellow  portrayed  sitting  on  the  seat 
there  on  the  left:  the  commander,  operator,  maintainer  of  the  device.  Here 
again,  a  quotation  from  a  Defense  Science  Board  report  to  the  Secretary  this 
summer: 

"The  division  of  tasks  between  the  man  and  the  machine  becomes  in¬ 
creasingly  critical  in  two  dimensions.  First,  there  is  the  problem 
of  personnel  skill  potential  (quality).  Average  reading  levels  and 
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aptitudes  come  into  play.  Secondly,  there  is  the  complexity  issue. 

To  the  extent  that  complexity  can  be  engineered  away  from  the  man- 
machine  interface  so  much  the  better,  if  it  can  be  afforded,  and  if 
it  is  not  translated  into  insoluable  problems  somewhere  back  in  their 
maintenance  sequence.  More  human  research  and  man-machine  technol¬ 
ogy  development  is  required.  The  current  problems  with  Built-In  Test 
Equipment  (BITE)  illustrate  the  doubtful  state  of  the  art.  The  divi¬ 
sion  of  maintenance  tasks  between  the  diagnostic  equipment  and  the 
mechanic  or  repairman  has  been  tilted  too  far  toward  the  machine  and 
they  have  generally  failed  to  live  up  to  their  advanced  billing.  In 
the  meanwhile,  the  people  people  must  interact  realistically  with 
the  engineers  at  the  outset.  This  is  an  art  not  yet  fully  developed. 
Testing  at  the  man-machine  interface  must  be  conducted  and  room  for 
corrective  design  interactions  provided  in  the  development  program." 
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I  would  be  remiss  if  I  did 
in  addition  to  the  issue  of  the 
have  a  quantity  problem  in  this 
quantity  problem,  and  so  too,  I 


not  immediately,  of  course,  acknowledge,  that 
quality  of  the  men  manning  weapon  systems,  we 
country.  The  Soviet  Union  faces  a  similar 
believe,  do  most  developed  countries. 
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But  we  also  have  a  quality  problem,  and  it  may  be  societal  in  scope,  as 
illustrated  by  the  decline  in  Scholastic  Aptitude  Test  results  nationwide. 
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But  even  more  disturbingly,  recognizing  that  we  are  in  a  serious  compe¬ 
tition  with  the  USSR,  our  high  school  graduates,  compared  with  Soviet  high 
school  graduates  of  the  past  decade,  have  had  far  less  disciplining  in  mathe¬ 
matics,  science  and  other  technologically  supportive  subjects  in  the  course  of 
their  schooling. 
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I  quote  here  from  Dr.  Isaak  Wirzup,  Professor  of  Mathematics  at  the 
University  of  Chicago,  and  Director  of  the  East  European  Survey  of  Mathematical 
Literature  for  the  National  Science  Foundation,  and  also  the  NSF*  s  Director  of 
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n i c a 1  documentation  per  tank,  and  now  fielding  over  40,000  pages  per  tank.  It 
is  easy  to  imagine  what  that  means  in  terms  of  complexity  for  the  mechanic  who 
has  to  be  able  to  find  his  path  into  his  technical  documentation  for  fault 
isolation,  repair,  replacement,  etc.  What  such  a  complication  does  in  terms  of 
slowing  the  rate  of  repair,  and  the  reliability  of  the  repair  is  predictable, 
and  the  predictable  is  happening  already  in  some  of  our  current  higher- technology 
weapons  systems. 
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Here  is  a  flow  diagram  showing  the  maintenance  process  for  the  anti-aircraft 
missile,  the'  Improved  Hawk.  Up  at  the  top  you  see  the  Hawk  unit's  own  mainten¬ 
ance,  and  across  the  bottom  the  direct  support  maintenance  performed  exogenous 
to  the  Hawk  unit.  The  highlighted  figures  suggest  that  of  parts  that  were 
sent  from  the  unit  back  to  the  direct  support  level  for  repair,  40%  when  examined 
were  found  to  be  faultless;  that  is  to  say,  almost  half  the  time  the  Hawk  unit 
removed  parts  from  the  weapons  system  and  sent  them  away  to  the  direct  support 
maintenance  unit  on  a  totally  unnecessary  trip.  Moreover,  you  can  see  over  on 
the  left,  when  the  parts  came  out  of  the  direct  support  maintenance  unit  and 
were  returned  to  the  Hawk  unit,  30%  of  the  time  they  didn't  check  out  when 
actually  refitted  to  the  system.  And  up  at  the  top,  the  box  in  the  center 
suggests  that  these  mishaps  are  a  function  of  inaccurate  trouble  shooting,  of 
inadequacies  in  the  built-in  test  equipment,  and  of  personnel  deficiencies  in 
training,  quantity  or  experience. 
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These  data  were  compared  with  the  I-Hawk  experience  of  the  Bundeswehr,  at 
least  with  respect  to  the  no-evidence-of-failure  rate.  There  was  a  dramatic 
difference  of  over  20  times  less  mal di agnosis . 
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its  program  on  Soviet  Applications  of  Computers  to  Management.  In  a  letter  to 

the  National  Science  Foundation,  Dr.  Wirzup  said  this: 

"The  Soviet  Union's  tremendous  investment  in  human  resources, 
unprecedented  achievements  in  the  education  of  the  general  popula¬ 
tion,  and  immense  manpower  pool  in  science  and  technology  will  have 
an  immeasurable  impact  on  that  country's  scientific,  industrial  and 
military  strength.  It  is  my  considered  opinion  that  the  recent 
Soviet  educational  mobilization,  although  not  as  spectacular  as  the 
launching  of  the  first  Sputnik,  poses  a  formidable  challenge  to  the 
national  security  of  the  United  States,  one  that  is  far  more  threat¬ 
ening  than  any  in  the  past  and  one  that  will  be  much  more  difficult 
to  meet. “ 
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I  think  you  are  familiar  with  these  data.  Regardless  of  what  you  may 
think  of  their  precision,  the  trends  are  plain.  The  population  recruited  into 
the  Army  recently  was  different  qualitatively  from  the  population  that  was 
inducted  or  recruited  in  the  wanning  years  of  conscription  in  the  United  States. 
This  is  not  to  say  that  the  Army  cannot  work  successfully  with  a  large  median 
population.  However,  it  is  evident  that  the  Army  encounters  significant 
difficulties  when  attempting  to  assign  masses  of  such  individuals  to  high 
technology  weapon  systems.  I  am  not  talking  about  future  difficulties;  I  am 
talking  about  problems  that  are  here  and  with  us  today.  Reference  was  made 
earlier  to  reading  ability. 
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This  is  what  is  happening  to  technical  documentation  vis-a-vis  airplanes. 

As  you  can  see,  the  Navy's  F-14  fighter  is  going  into  the  fleet  accompanied  by 
some  300,000  pages  of  technical  documentation. 
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And  it  is  not  just  airplanes,  but  tanks:  here  you  can  see  the  Army’s 
starting  early  in  World  War  II  with  something  less  than  1,000  pages  of  tech- 
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As  you  might  expect,  if  an  armed  force  Is  willing  for  a  technologically 
advanced  system  such  as  the  Improved  Hawk,  to  invest  individuals  with  3-year 
terms  of  service,  who  are  all  high  school  graduates,  and  who  have  other  advan¬ 
tages  over  what  the  U.S.  Army  has  been  working  with  on  the  ieft,  the  performance 
of  the  weapon  system  is  almost  invariably  going  to  look  better. 
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Now,  it  isn't  that  we  are  picking  less  apt  soldiers  to  man  the  Improved 
Hawk  as  a  system.  These  are  Electronic  Aptitude  scores.  As  you  can  see, 
compared  with'  the  total  force,  the  1-Hawk  system  is  getting  a  pretty  good  cut 
of  the  soldiers  of  higher  electronic  aptitudes.  And,  of  course,  the  I-Hawk  is 
but  one  of  many  systems  that  have  a  demand  for  soldiers  with  these  high  scores. 
Part  of  our  problem  is  that  the  tests  which  produced  these  scores  are  normative. 
Part  of  our  Droblem  is  that  even  high  scoring  soldiers  may  not  be  up  to  the 
maintenance  task  to  which  we  have  put  them.  Part  of  our  problem  is  exactly 

that  the  weapon  system  may  be  too  smart,  and  we  need  to  engineer  it  some  more 

in  order  to  drive  its  maintenance  back  into  routines  with  which  we  know  the 

soldiers  we  have  can  cope.  These  are  all  very,  very  intractable  problems,  to 

handle  which  the  Army  of  the  future,  or  the  Air  Force  of  the  future,  or  the 
Navy  of  the  future,  are  going  to  need  the  skills  of  you  psychometricians  far 
more  then  ever  in  the  past. 

Slide  33 

What  to  oo?  I  advocate  a  manpower  strategy  for  the  Armed  Forces,  in 
complying  with  the  guidance  of  the  Secretary  of  Defense,  which  would  operate  on 
all  three  phases  or  learning  regimes  of  the  soldier,  sailor  or  airman. 

In  the  past,  of  course,  the  services  have  focused  resources  on  the  center  phase, 


labeled  here  the  school,  and  they  will  have  to  keep  doing  so.  They  have  sort 
of  left  it  to  hope,  to  chance  or  to  high  enlistment  bonuses  that  the  Phase  0 
product  would  pan  out  for  them:  not  a  sound  approach,  given  what  is  happening 
in  SAT  scores.  We  are  going  to  have  to  do  something  about  finding  individuals 
in  Phase  0  who  are  fit  to  become  stellar  performers  both  in  Phase  I  and  Phase 
II.  But  it  is  equally  important  to  devise  a  plan  for  Phase  1 1 --On  the  Job.  It 
has  been  the  custom  of  the  Services  of  the  United  States  to  let  Phase  21,  the 
on-the-job  phase,  take  care  of  itself,  relegating  what  may  be  the  most  important 
adult  educational  experience  to  first-line  supervisors  largely  ill-trained  for 
teaching,  and  unaware  of  their  responsibility  for  same.  I  suggest  to  you  that 
we  can  no  longer  depend  upon  such  hit-or-miss  methods  if  we  are  going  to  modernize 
the  force,  to  bring  in  large  numbers  of  very  technologically  advanced  weapon 
systems.  I  hold  that  we  must  launch  now  a  concerted  campaign  to  intervene 
systematically  in  Phase  II,  the  on-the-job  training  phase  of  development  for  the 
soldier,  sailor  or  airman,  both  to  assure  functional  mastery,  and  to  provide  an 
ability  in  the  force  to  handle  the  influx  of  newer  technologies  that  will  pour 
out  of  on-going  research  and  development  programs. 
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Is  there  a  technological  intervention  appropriate  here?  Yes,  you  may 

recall  Number  4  from  the  list  of  17  DS5  commended  technologies,  this  one.  One 

last  time  I  quote,  if  I  may,  from  a  Defense  Science  Board  report: 

"It  should  also  be  the  policy  of  DOD  that  support  will  be  provided 
for  these  high  performance  systems  at  a  level  which  will  meet 
peacetime  operating  and  training  requirements  and  which  also  will 
provide  the  base  fcr  surging  to  wartime  utilization  and  sustainment 
rates.  In  wartime  intense  combat  periods,  and  during  peacetime 
'surge  trials'  it  will  be  the  objective  to  move  actual  field  avail¬ 
ability  A0  close  to  A-j ,  intrinsic  availability.  Specific 
support  program  goals  should  be  established  at  the  beginning  cf 
development  and  managed  thereafter  with  the  same  priority  attention 
and  intensity  normally  accoroed  to  performance.  Trainirg  support 
goals  should  relate  to  higher  standards  based  on  advanced  training 
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technology  nc w  becoming  available.  Advanced  job-aids  should  be 
designed  for  simultaneous  use  in  training  and  on  the  job.  Soon  it 
will  be  impossible  to  maintain  this  kind  of  technical  documentation 
by  conventional  generation,  distribution,  and  substitution  of  paper 
drawings  and  tex„.  Digital  communication,  storage,  and  display  of 
changes  will  De  required.  This  whole  area  should  be  promoted  during 
the  acquisition  cycle  not  only  by  RSD  community,  but  by  personnel 
specialists  and  commanders." 

Mow  ladies  and  gentlemen/'it  is  not  clear  at  this  time  whether  the  strat¬ 
egy  advocated  in  the  Defense  Guidance  will  enhance  the  national  security  of  the 
United  States.  Both  its  technological  and  sociological  feasibility  are  seriously 
in  doubt.  The  manpower  policy  challenges  posed  by  that  strategy  are  enormous  in 
their  implications.  The  branchings  in  the  paths  ahead  are  altogether  too 
numerous  for'  easy  mapping  or  classification.  It  seems  crystal  clear,  however, 
that  most  of  them  involve  choices  that  could  more  confidently  be  made  were  we  to 
have  much  better  information  about  our  manpower  than  is  presently  available 
to  commanders  and  managers  in  the  Department  of  Defense.  Hence,  I  appeal  to 
you,  individually  and  collectively,  to  lend  us  a  hand  with  your  skills.  Upon 
your  response,  the  very  security  of  the  nation  may  rest. 
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Adams,  Jerome,  MAJ,  United  States  Military  Academy,  West  Point,  New  York.  & 
Hicks,  Jack  M. ,  US  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences,  Alexandria,  Virginia.  (Tues.  P.M.) 


Performance  of  Hale  and  Female  Cadets  during  Cadet  Field  Training 


All  female  cadets  (N  ■  86)  and  a  random  sample  of  male  cadets 
(N  •  49)  in  the  Class  of  1982  were  compared  in  terms  of  the  performance 
ratings  they  received  at  Cadet  Field  Training  (CFT).  Statistically 
significant  results  from  multivariate  analysis  of  variance  Indicated 
that  male  cadets  were  evaluated  more  favorably  than  were  female  cadets. 
However,  the  magnitude  of  these  gender  differences  was  small.  Other 
effects  indicated  that:  (1)  regular  Army  officers  rated  cadets  less 
favorably  than  did  more  senior  cadets;  (2)  male  raters  did  not  differ 
from  female  raters  in  how  they  rated  cadets  at  CFT;  (3)  cadets  in  the 
squad  member  role  were  rated  lower  than  cadets  in  leadership  or  admin¬ 
istrative  roles;  and  (4)  the  difference  between  ratings  of  male  and 
female  cadets  was  less  in  the  squad  member  role  than  in  either  leader 
or  administrator  roles.  A  sex  role  stereotype  interpretation  was 
suggested  for  the  gender  effect  shown  with  the  CFT  rating  form. 
Possible  directions  for  change  in  the  rating  form  were  proposed.  Some 
cautions  regarding  source  and  context  effects  were  suggested  for  Acad¬ 
emy  personnel  taking  action  on  the  basis  of  CFT  performance  ratings. 
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Leadership  Performs 
During  Cadec 


Jeroae  Adams 
Associate  Professor 
Science  Research  Lab,  USMA 
West  Point,  NY  10996 


ce  Appraisal  Ratings 
Field  Training 


Jack  M .  Hie  k s 

Senior  Research  Psychologist 
Army  Research  Institute 
Alexandria,  VA  22  333 


This  paper  describes  the  results  o’f  analyses  comparing  the 
performance  appraisal  ratings  given  to  male  and  female  cadets  during 
various  phases  of  Cadet  Field  Training  (CFT).  CFT  takes  place  at  Camp 
Buckner,  a  special  military  field  training  facility  located  on  the 
grounds  of  the  United  States  Military  Academy.  CFT  is  scheduled  for  a 
seven  week  period  during  the  summer  following  completion  of  the 
freshman  year  at  the  Academy.  During  this  period,  the  cadets  are 
introduced  to  an  orientation  into  the  combat  arms  and  combat  support 
branches  of  the  Army,  e.g.  armor,  field  artillery,  infantry, 
engineering,  signal,  air  defense  artillery,  etc.  Consistent  with  the 
physically  demanding  nature  of  being  in  the  combat  and  combat  support 
branches,  the  CFT  training  regimen  is  very  demanding  in  terms  of  the 
level  of  physical  effort  and  endurance  involved.  Long  marches,  daily 
runs  and  calisthenics,  manual  manipulation  of  heavy  construction 
materials,  long  hours  in  hot  armored  vehicles,  eating  meals  in  the 
field  including  field  rations,  and  too  little  sleep  are  just  some  of 
the  physical  rigors  which  characterized  CFT  at  the  time-, of  this  study 
in  1979. 

One  important  goal  of  CFT  is  to  develop  the  leadership  skills  of 
the  participating  cadets.  In  line  with  this  goal,  cadets  are  assigned 
to  temporary  leadership  during  the  course  of  training.  For  example,  a 
cadec  may  serve  as  squad  leader  or  as  a  section  chief  or  platoon 
sergeant  for  a  temporary  period  in  training  and  then,  at  the  end  of  the 
same  day,  the  cadet  would  return  to  the  role  of  squad  member.  Thus, 
there  is  a  continuous  rotation  of  leadership  roles. 


*T he  research  reported  here  was  supported  by  grant  MDA  9Q3-73-G02 
from  the  U.S.  Army  Research  Institute  for  the  Social  and  Behavioral 
Sciences  (Jerome  Adams,  Principal  Investigator). 

**This  paper  represents  the  views  of  the  au'bcrs  and  not  the  otficial 
position  of  the  U.S.  Military  Academy,  the  U.3.  Army  Research 
Institute,  the  U.S.  Army,  or  any  other  governmental  agency  unless  so 
designated  by  authorized  documents. 

thank  Debra  Instone  and  Robert  Rice  for  their  assistance  on  an 
earlier  report  and  for  their  assistance  in  data  anal.’ sis. 
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At  several  points  during  the  course  of  training,  the  performs  nee 
of  cadets  at  Cr’T  is  formal Ly  evaluated  by  either  upperc  lass  West  Point 
c  dels,  or  by  the  regular  Array  officers  chat  supervise  cadet  training 
(i.e..  Tactical  Officers).  A  cadet  may  be  in  a  squad  aenber,  a  leader, 
or  occupy  a  supervisory  staff  role  at  the  time  of  these  evaluations. 

In  this  context,  typical  supervisory  positions  include  transportation 
sergeant,  supply  sergeant,  training  sergeant,  etc.  Such  positions 
carry  considerable  responsibility,  but  are  outside  of  the  direct  line 
of  command  and  have  limited  authority  when  compared  to  line  leader 
roles  such  as  assistant  squad  leader,  squad  leader,  platoon  leader, 
first  sergeant,  etc. 

Their  performance  appraisal  ratings,  developed  and  regularly 
collected  under  institutional  authority,  served  as  the  data  analyzed 
for  the  present  report.  The  major  objective  of’  these  analyses  was  to 
compare  the  ratings  given  to  male  and  female  cadets  as  they  performed 
temporary  military  training  exer-cises  in  3  ny  of  these  three  roles 
(i.e.,  member  of  squad,  formal  leader,  or  supervisor). 


METHOD 


Su  bjects 

Performance  appraisal  ratings  were  obtained  for  all  female  cadets 
( N  =>  86)  and  a  random  sample  of  male  cadets  (M  =»  49)  enrolled  in  the 
Class  of  1982  at  the  United  States  Military  Academy  during  the  summer 
of  1979.  All  these  cadets  were  participants  in  Cadet  Field  Training, 

( C  FT)  . 


Performance  Measures 


An  11  item  rating  form,  developed  and  administered  by  the  Academy, 
was  used  to  evaluate  the  perforaance  of  cadets  as  they  performed 
various  duties  associated  with  CrT.  The  forms  used  by  Tactical 
Officers  (TAC)  and  more  senior  cadets  holding  roles  in  the  Cadet  Chain 
of  Command  (C3C)  included  the  S3me  11  items.  Each  of  the  11  dimensions 
used  in  the  rating  form  is  described  below.  (A  copy  of  the  form  is 
included  as  Appendix  A). 

(1)  Sense  of  responsibility  and  reliability  in  the  execution  of 
assigned  tasks. 

(2)  Ability  and  willingness  to  work  in  harmony  with  others. 

(3)  Ability  to  grasp  a  situation,  think  clearly  and  develop 
logical  conclusions. 

(4)  Ability  in  organizing  and  directing  the  efforts  of  others. 
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(6)  Motivates  others  by  his  keen  interest  and  personal 
part  ic  i  p  a  t ion. 

(7)  Initiative,  forcetuiness  and  aggression. 

(8)  Ability  to  bear  up  under  pressure;  the  will  to  persevere  in 
the  face  of  obstacles. 

(9)  Ability  to  adjust  to  new  or  changing  situations  and  stresses. 

(10)  Renders  faithful  and  willing  support  to  superiors  and 
followe  rs  . 

(11)  Overall  performance  of  duties. 

The  first  ten  dimensions  are  raced  on  a  five  point  scale  and  the 
eleventh  one  uses  a  ten  point  scale. 


Unit  of  Anal  vs  is 


Each  cadet  was  rated  by  one  to  five  evaluators.  Hence,  the  total 
number,  of  evaluations  was  487  (  205  for  male  cadets  282  for  female 
cadets).  In  the  analyses  to  follow,  the  evaluation  rather  chan  the 
individual  cadet  serves  as  the  unit  of  analysis  (l.e.,  H  =*  487  ,  not 
L  35)  . 


RESULTS 


Intercorrelations 


Table  1  presents  the  I  nt  e  rc  o  r  re  1  a  t  io  ns  among  the  11  questions 
making  up  the  rating  scale.  These  intercorrelations  are  quite  high, 
suggesting  chit  there  is  relatively  little  discrimination  in  the  way 
raters  respond  to  these  11  different  items.  Until  recently,  regular 
Army  officer  evaluations  were  subject  to  a  highly  skewed  (Inflated) 
performance  score.  This  general  leniency  error  appears  to  generalize 
bv  cadets  who  typically  emulate  officer  role  model  behaviors. 


Analysis  of  Variance 

A  multivariate  analysis  of  variance  was  conducted.  In  this  2x3 
x  11  analysis,  Cadet  Gender  ( Male- Female )  and  Cadet  Role  (Member  of 
Squad,  Leader,  Supervisor)  served  as  the  independent  variables,  and  the 
LI  performance  ratings  served  as  the  dependent  variables. 


The  MAH  OVA  yielded  two 
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univariate  analyses  vere  performed. 

Because  of  space  limitations,  I  will  only  describe  briefly  the 
univariate  results  related  to  each  of  the  three  effects  tested  in  thesd 
analyses  of  the  11  ratings.  The  complete  tabular  results  are  availably 
in  the  full  manuscript. 


Main  effect  for  Cadet  Gender.  Seven 
significant  main  effects  for  Cadet  Gender 
variance.  The  direction  of  these  effects 


of  the  11  items  yielded 
in  the  univariate  analyses  ol 
consistently  showed  males 


being  rated  more  favorably  than  females  in  every  instance  of  a 
significant  difference.  Specifically,  male  cadets,  relative  to 
cadets,  were  rated  as: 


female 


-  being  more  reliable  and  responsible  in  executing  assigned  tasks 

-  having  greater  ability  to  grasp  a  situation,  think  clearly,  and 
develop  logical  conclusions 


-  having  greater  ability  to  organize  and  coordinate  the  efforts  of- 
others 


-  having  greater  capacity  for  increased  responsibility 

-  more  motivating  to  others  through  keen  interest  and  personal 
participation 

-  having  greater  initiative,  forcefulness,  and  aggressiveness 

-  better  in  overall  performance  of  duties 

On  only  the  four  following  items  did  the  difference-,  in  ratings  of 
male  and  female  cadets  fail  to  achieve  conventional  levels  of 
statistical  significance: 

-  ability  and  willingness  to  work  in  harmony  with  others 

-  ability  to  bear  up  under  pressure;  the  will  to  persevere  in  the 
face  of  obstacles 


-  ability  to  adjust  to  new  or  changing  situations  or  stresses 

-  renders  faithful  and  willing  support  to  superiors  and 
s  u  bo  r  d  i  na  c  a  s 

However,  for  all  four  of  these  items  the  direction  of  the  means  was  the 
sane  as  that  found  in  the  significant  effects,  i.e.,  males  were  rated 
higher  than  females. 

Main  effects  for  Cadet  Role.  The  significant  multivariate  effect 
for  Cadet  go  1  e  is  reflected  by  significant  univariate  effects  for  10  of 
the  II  rating  dimensions.  Cadets  in  the  role  of  squad  member  were 
evaluated  least  favorably  and  these  occupying  an  administrative  role 
were  evaluated  most  favor  ibly.  Cadets  in  positions  of  formal 
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leadership  at  the  cirae  they  were  rated  were  the  recipients  of 
intermediate  evaluations,  falling  between  the  other  two  roles.  For 
some  dimensions  the  oean  ratings  of  cadets  in  leader  roles  fell  closer 
to  the  mean  for  cadets  in  squad  member  roles.  While  for  other 
dimensions  the  means  for  leaders  were  closer  to  the  means  received  by 
cadets  holding  administrative  roles. 

Interaction  effect  of  Cadet  .Gender  x  Cadet  Role.  In  the 
univariate  analyses  of  variance,  six  of  the  11  dimensions  yielded 
significant  interactions  between  Cadet  Gender  and  Cadet  Role.  The 
means  associated  with  these  effects  are  presented  in  Table  2  along  with 
the  results  of  Newman-Xuels  multiple  comparison  tests  (Winer,  1971,  pp. 
210-219).  Those  means  sharing  the  same  subscript  in  Table  2  are  no t 
significantly  different  from  each  other.  Each  Newman-Kuels  test  is 
limited  to  a  single  dependent  variable.  Thus, in  examining  the 
subscripts  associated  with  the  means  for  a  significant  interaction,  one 
must  consider  each  row  of  the  table  separately. 

The  results  of  these  2x3  Interactions  are  complex.  However,  it 
is  possible  to  offer  some  general  comments  for  these  six  significant 
interactions.  The  smallest  difference  between  the  ratings  of  male  and 
female  cadets  was  found  when  cadets  were  members  of  squad.  None  of  the 
six  dimensions  yielding  significant  interactions  showed  a  significant 
difference  between  ratings  of  male  and  female  cadets  while  in  the  role 
of  member  of  squad.  However,  for  five  of  the  six  variables,  the 
difference  between  ratings  of  male  and  female  cadets  was  significant 
for  cadets  holding  formal  leadership  roles.  In  formal  leader  roles  as 
this  type,  male  leaders,  relative  to  female  leaders  were  rated  as: 

-  having  greater  capacity  for  increased  responsibility 

-  having  greater  ability  to  organize  and  coordinate  the  efforts  of 

o  t  he  r  s 

-  having  greater  initiative,  forcefulness,  and  aggressiveness 

-  having  greater  ability  to  adjust  to  new  or  changing  situations 
and  stresses 

-  better  in  overall  performance  of  duties 

Regarding  the  third  role,  administrator,  the  pattern  of  male-female 
differences  was  soaewhac  mixed.  Three  variables  showed  no  significant 
differences  between  males  and  females  in  this  role  (capacity  for 
increased  responsibility;  adjusting  to  new  situations;  and  support  to 
superiors  and  subordinates).  The  other  three  variables  yielded 
significantly  higher  ratings  for  male  cadets  in  administrative  roles 
than  for  female  cadets  in  these  roles  (organizing  and  directing  efforts 
of  others;  initiative,  force  fulness ,  and  aggression;  overall 
performance).  In  sum,  these  interactions  show  the  little  discrepancy 
between  ratings  of  male  and  female  cadets  in  follower  roles,  but  higher 
ratings  cor  male  cadets  in  formal  leadership  roles,  and  to  a  lesser 
extent  higher  ratings  for  males  c  ha  n  females  in  administrative  roles. 
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DISCUSSION 


The  aost  noteworthy  result  of  the  present  study  is  the  consiste't 
difference  in  _ he  ratings  of  male  and  female  cadets.  Males  were  rated 
significantly  higher  than  females  on  seven  of  the  11  rating  dimensions. 
Interaction  effects  indicated  that  for  many  of  these  dimensions,  this 
gender  effect  was  smallest  when  cadets  were  in  a  squad  member  role  and 
greatest  when  they  held  positions  of  formal  leadership  or 
a  dai  nis  t  ra  t  ive  responsibilities.  An  examination  of  the  correlations 
between  cadet  gendt-r  and  the  11  rating  dimensions  presented  in  Table  1 
shows  that  the  magnitude  of  the  gender  effects  in  the  present  study  are 
small  in  magnitude.  The  largest  correlation  involving  the  cadet  gender 
variable  is  only  .22,  indicating  less  than  5T  shared  variance  (with 
item  4,  organize  and  direct  others). 

Many  different  interpretations  could  be  offered  for  these  small, 
but  statistically  significant,  effects  showing  that  males  are  rated 
more  favorably  than  females  in  carrying  out  their  duties  as  CFT.  One 
interpretation  of  these  findings  can  be  offered  in  terms  of  sex  role 
stereotypes.  beaux  (1976)  indicates  that  females  are  expected  to  be 
compassionate,  emotional,  warm,  quiet,  gentle,  passive  and  tactful 
v hile  males  are  expected  to  be  analytic,  aggressive,  independent, 
assertive,  confident,  and  direct.  Schein  (  1  97  3)  demonstrated  the 
relevance  of  sex  role  stereotypes  for  concerns  with  leadership  and 
managerial  roles.  She  compared  the  stereotype  of  males  and  females 
with  the  stereotype  of  effective  managers.  Her  finding  was  that  "male" 
and  "effective  manager"  share  a  large  number  of  common  stereotypic 
attributes  while  "female"  and  "effective  manager"  share  faw  attributes. 
The  implication  of  Schein's  study  is  that  masculinity  aay  be  perceived 
as  effectiveness  in  leadership  roles  while  femininity  may  be  perceived 
as  being  less  effective. 


The  Cadet  Role  effects  aay  reflect  nothing  more  than  the 
behavioral  repertoire  appropriate  for  the  different  roles.  It  may  be 
very  difficult  to  display  some  of  the  behaviors  and  traits  included  in 
the  rating  scale  while  in  the  squad  member  role.  On  the  other  hand, 
there  are  abundant  opportunities  to  engage  in  such  behaviors  while  in 
leadership  or  administrative  roles.  No  doubt  alternative 
interpretations  could  be  offered  for  c  ne  phenomena  discussed  above. 
£“hile  there  remains  ambiguity  in  the  theoretical  meaning  of  Cadet  Role 
and  Rater  Role  effects,  the  practical  implications  of  these  results  are 
clear.  Namely,  when  acting  on  CRT  performance  racings.  Academy 
personnel  have  begun  to  a:  ‘end  co  both  the  source  and  the  context  of 
such  ratings.  The  present  results  show  that  it  is  unjustified  to 
assume  comparable  meaning  for  ratings  from  officers  and  from  cadets,  or 
for  cadets  occupying  different  roles  in  the  CFT  program.  Before  raking 
any  administrative  actio  ns  on  the  basis  of  such  ratings,  Academy 
officials  have  cone  to  recognize  that  it  is  essential  to  judge  the 


the  source  and  role  from  which  the  ratings  are 


References 


Adams,  J . ,  Rice,  R.W.  and  Instone,  D.,  The  1979  Summer 

Leadership  Study:  A  comparison,  of  male  and  female  leaders 
(Tech.  Rep.  80-2).  Army  Research  Institute  fcr  Social 
and  Behavioral  Sciences,  -Grant  MDA  903-78-GG2,  July  1980. 

Cohext,  E.  ,  and  Buer.s,  P.  SPSS-Manova-Multivariate  Analysis  of 
Variance  and  Covariance .  Evanston,  Ill. :  Vogel  Computing 
Center,  Northwestern  University,  Hay  1976. 

Deaux,  K.  The  Behavior  of  Women  and  Men.  Monterey,  Calif. : 
Brooks/Cole,  1976. 

Rice,  R.W. ,  Bender,  L.R. ,  and  Vitters,  A.G.  Leader  sax, 

follower  attitudes  toward  women  and  leadership  effectiveness 
A  laboratory  experiment.  Organizational  P ehavic v  and 
Human  Performance ,  1980,  25,  46-78. 

Rice,  R.W. ,  Yoder,  J.D. ,  Adams,  J. ,  Priest,  R.F.,  and  Prince, 

H.T.  II.  Correlates  of  leadership  ratings  for  male  and 
female  military  cadets.  Paper  presented  at  the  convention 
of  the  Eastern  Psychological  .Association,  Hartford,  Conn.  , 
April  1980. 

Schein,  V.E.  The  relationship  between  sex  role  stereotypes 
and  requisite  management  characteristics.  Journal  of 
An?  lied  Psycho  logy,  1973,  57_,  95-100. 

Winer,  3.J.  Statistical  Principles  in  Experimental  Design. 

New  York:  McGraw-Hill.  1971. 


Table 


2£o:  N  »  4  79  -  481 ,  r  >  .10  is  significant  with  n^.ns  *„,i 


AD  P001284 


Adams,  Jerome,  MAJ,  Richards,  John,  CPT  &  Fullerton,  Terry,  CPT,  United 
States  Military  Academy,  West  Point,  New  York.  (Wed.  A.M.) 


Relationship  between  Attitudes  and  Leadership  Style:  A  Policy  Cap¬ 
turing  Approach 


The  present  study  sought  to  extend  understanding  of  the  relational 
qualities  of  leadership  style  using  policy  capturing.  The  approach 
required  analysis  of  the  styles  of  leadership  used  by  battalion  com¬ 
manders  in  dealing  with  problem  soldiers.  A  questionnaire  was  admin¬ 
istered  to  battalion  commanders  to  gather  information  about  what  types 
of  personal  problems  they  have  found  as  leaders  and  what  solution 
techniques  were  employed  to  deal  with  the  problems.  Leaders'  attitudes 
toward  soldiers  were  also  obtained  as  a  basis  for  differentiating 
between  developmental,  punitive  leaders,  and  "administrative"  leaders. 
Subjects  were  326  male  battalion  commanders  assigned  to  locations 
throughout  the  continental  US,  Europe,  and  the  Far  East.  Responses 
regarding  soldier  problems  were  submitted  to  a  principle  component, 
varimax  rotation  factor  analysis.  The  four  generic  factors  identified 
were  (1)  on  the  job  problems,  (2)  substance  abuse  problems  (3)  socio- 
emotional  problems,  and  (4)  AWOL.  The  three  leadership  styles  differed 
most  in  their  handling  of  job  performance  problems,  less  regarding 
soclo-emotional  problems,  and  less  yet  on  substance  abuse  problems. 
There  were  no  discernable  differences  regarding  AWOL. 
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Objectives  of  the  Study 

The  main  objective  of  this  study  is  to  attain  more  under¬ 
standing  of  the  relational  qualities  of  leadership  in  military 
organizations.  These  qualities  are  often  described  in  terms  of 
leader  style  as  in  Fiedler's  contingency  model  (1964);  however, 
the  emphasis  is  on  a  more  focused  view  of  "style"  as  it  relates 
specific  leader  behaviors  to  standard  organizational  problems. 

The  previous  failure  of  most  traditional  personality  measures 
to  predict  leader  behavior  has  caused  many  scientists  to  adopt 
the  newer  view  that  such  behavior  is  an  interactive  function  of 
personality,  dispositions,  and  the  situation  (Hollander  &  Neider) . 
What  seems  necessary  to  further  the  predictive  quality  of  the 
study  of  leadership  in  groups  and  organizations  is  a  better 
grasp  of  these  interactive  qualities  in  a  standardized  social 
interaction  from  a  policy-capturing  perspective.  Therefore,  the 
present  study  seeks  to  extend  the  understanding  of  the  relational 
qualities  of  leadership  using  policy  capturing.  This  approach 
requires  analyzing  the  styles  of  leadership  used  by  commanders 
to  deal  with  problem  soldiers.  Several  important  factors  give 
special  meaning  to  this  study: 
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Commanders  are  senior  leaders  in  organizations  whose 
decisions  directly  influence  outcome  measures  of 
subordinate  members. 

The  study  uses  four  standard  categories  of  leader 
problems  which  all  military  commanders  must  face. 

The  study  uses  fourteen  standard  solution  techniques 
to  resolve  the  four  problem  categories. 

The  study  uses  a  standard  personality  attitudinal 
measure . 

The  study  was  conducted  in  the  real-life  context  of 
more  than  three  hundred  organizations  with  leaders 
performing  similar  organizational  functions. 

The  formal  and  objective  approach  of  policy  capturing  in 
leadership  provides  a  conceptual  framework  of  how  leaders  behave 
in  a  given  situation.  For  example,  we  will  determine  if  each 
military  leader  is  considering  the  same  information  and  placing 
the  same  importance  on  specific  problem  solutions  for  a  given 
sot  of  standard  problems.  Only  a  small  portion  of  research  con¬ 
cerning  policy  capturing  has  been  concerned  with  industrial 
problems  (see  Christal  1968;  Naylor  &  Wherry,  1965;  Christal  1969). 

Althougr  -egrettable,  the  paucity  of  research  is  due  to 
practical  constraints.  Policy  capturing  methods  require  a  large 
number  of  comparable  judges;  hence,  the  methods  are  typically 
applicable  only  in  large  homogeneous  organizations.  However, 
a  setting  such  as  the  military  would  be  an  ideal  setting  to 
determine  how  similar  leaders  are  in  their  behaviors  toward 
standard  leadership  problems. 

Method 


As  indicated,  the  research  to  be  reported  is  an  attempt  to 
study  leadership  as  an  interactive  process  using  a  policy  capturing 
perspective.  The  technique  used  required  the  administration  of 
a  questionnaire  to  commanders  to  gather  information  about  what 
types  of  problems  they  have  found  as  leaders  and  what  solution 
techniques  were  employed  to  deal  with  the  leadership  problems. 
Information  was  also  obtained  about  the  leaders'  attitudes 
toward  problem  soldiers  as  a  basis  for  differentiating  between 
developmental  leaders,  punitive  leaders,  and  mere  administrative 
adjudication  processing. 

Subjects 


Subjects  were  326  Army  commanders  assigned  to  locations 
throughout  the  continental  United  States.  The  research  was  part 


of  a  larger  program  geared  to  developing  more  effective  unit- 
level  techniques  for  addressing  the  problems  among  first  term 
enlisted  soldiers. 

Procedure 

Military  leaders  were  tasked  to  complete  a  questionnaire 
about  their  attitudes  toward  problem  soldiers  and  what  leadership 
style  they  used  to  resolve  problem  issues.  The  information  con¬ 
tained  in  this  study  is  restricted  to  the  military  leaders’ 
responses  to  questions  of  working  with  problem  soldiers. 

In  an  effort  to  reduce  the  number  of  categories  of  soldier 
problems  rated  in  the  survey,  it  was  decided  to  submit  the  leaders' 
responses  to  questions  about  problem  soldiers  to  a  principal 
component,  varimax  rotation,  factor  analysis.  This  analysis 
suggested  that  four  factors  be  retained.  The  four  retained 
factors  were  thus  conceptualized  as:  I  -  On  the  Job  Problems, 

II  -  Substance  Abuse  Problems,  III  -  Socio-Emotional  Problems, 

IV  -  AWOL.  Together  the  factors  accounted  for  55.7%  of  the 
observed  variance .  These  dimensions  permitted  logical  groupings 
of  the  original  14  problems  into  four  supra-ordinate ,  broader 
problem  areas.  These  clusterings  were  then  imposed  on  two 
related  survey  sections  dealing  with  the  perceived  frequency 
of  soldier  problems  and  the  estimated  likelihood  of  problems 
ultimately  resulting  in  separation  from  service. 

In  order  to  decrease  the  number  of  solution  categories 
to  problems,  it  was  determined  to  factor  analyze  the  solutions 
to  four  problems:  job  performance,  substance  abuse,  marital 
problems  and  AWOL.  These  four  problems  were  chosen  because 
they  best  represented  (by  virtue  of  factor  loadings)  the  four 
supra-ordinate  categories  of  soldier  problems.  This  factor 
analysis  and  rational  consideration  suggested  the  following 
further  supra-ordinate  solution  categories:  1)  Informal 
Counseling;  2)  Non-punitive  Aids  to  Military  Adjustment; 

3)  Remedies  for  Substance  Abuse  Problems;  4)  Punitive  Non¬ 
discharge  Remedies;  5)  Early  Discharge  Program;  6)  Separation 
for  University  and  7)  Other  Discharges.  The  next  set  of 
analyses  describe  the  attempt  to  determine  predictors  of 
personnel  management  style.  Using  commander's  demographic 
attitudinal  and  personal  characteristics  as  predictors  these 
variables  were  entered  into  a  series  of  multiple  regressions 
to  see  how  much  they  predicted  leader  style.  Leader  style  is 
a  criterion  measure  for  each  commander.  This  score  is  computed 
by  creating  a  7  x  4  matrix.  Thus,  each  commander  has  a  score 
in  28  cells  cf  this  solution/problem  matrix.  The  proportion  of 
variance  accounted  for„by  the  predictor  variable:  was  as  expected, 
minimal  R*"  =  .013  to  =  .109. 


The  next  set  of  analyses  allowed  us  to  examine  the  re¬ 
lational  qualities  of  traits,  behavior,  and  situation  from 
a  policy  capturing  perspective.  In  order  to  accomplish  this, 
a  series  of  discriminant  analyses  were  performed.  The  results 
are  given  in  Table  1. 

The  results  of  all  of  the  analyses  are  described  in  more 
detail  in  the  full  paper.  Because  of  space  limitations,  we  will 
only  summarize  our  findings  as  the  data  edify  the  complex 
relational  qualities  of  leader  attributes,  style  and  work 
situation.  First,  the  attempt  to  predict  leader  behaviors 
from  personality  information  was  not  successful.  This  outcome 
is  consistent  with  early  research  by  the  trait  theorists  who 
tried  to  characterize  effective  versus  ineffective  leaders 
based  upon  personal  characteristics.  Among  the  characteristics 
of  the  leaders  studied  here,  the  best  predictor  of  leadership 
behavior  is  attitude  toward  problem  soldiers.  Clearly,  it  is 
that  characteristic  which  is  most  clearly  important  and  salient 
with  the  situation.  Sherif  (1948)  noted  that  results  such  as 
those  presented  here,  when  he  observed  that  leadership  is  not 
determined  by  "personal  qualities  in  the  abstract"  (p.  456). 
Finally,  these  results  are  also  consistent  with  Stogdill's  (1974) 
observation;  leadership  performance  is  most  often  determined 
by  a  pattern  of  personal  characteristics  in  a  situationally 
specific  setting.  Hence,  contributing  personal  characteristics 
are  dependent  upon  the  situation. 

Such  preliminary  findings  have  led  us  in  this  research 
to  consider  the  relational  qualities  of  personal  characteristics 
with  a  behavioral  outcome  using  homogeneous  problems  in  similar 
role  level  settings  as  exemplified  in  the  highly  structured 
organization  found  in  the  military. 

Consistently,  we  were  able  to  significantly  discriminate 
between  leaders  whose  behavior  toward  standard  problems  was 
developmental  or  punitive  and  in  the  last  case  merely  admini¬ 
strative.  Generally  speaking,  we  found  support  for  differences 
in  how  military  leaders  behave  to  the  same  problems.  The  leaders 
who  were  classified  as  developmental  were  most  distinct  in  how 
they  behaved  as  evidenced  by  the  soltuions  they  chose  to  deal 
with  problem  soldiers.  This  suggests  a  high  degree  of  similarity 
of  value  or  judgement  among  developmental  leaders  in  terms  of 
consistency  of  behavior  when  addressing  realistic,  standard 
problems  in  common  organizational  settings. 

The  perspective  of  policy  capturing  serves  as  a  useful  method 
in  revealing  that  overall,  developmental  leaders  make  similar 
judgements  on  appropriate  behavior  for  areas  where  the  problem 
soldier  can  be  given  the  opportunity  to  improve  and  possibly 
make  long-term  meaningful  contributions  to  the  organization. 

These  leaders'  actions  are  highly  consistent  to  underscore  a 
value  system  that  leads  to  judgemental  consistency  of  their 
behavior.  Obviously,  punitive  leaders,  who  do  not  espouse  the 
same  personal  developmental  attitudes  as  a  dispositional  character¬ 
istic,  differ  from  those  who  do  in  terms  of  the  degree  to  which 
the  relational  qualities  of  traits  and  styles  affect  decisions 
for  standard  problems  in  homogeneous  settings.  Where  we  expected 


to  find  the  least  difference  between  developmental  and  punitive 
leaders  (in  the  administrative  choice  outcomes)  the  evidence 
substantiates  this  similarity. 

Although  possible  alternative  explanations  can  never  be 
ruled  out,  it  is  possible  that  the  train-style-situation 
leadership  link  is  actually  due  to  some  extraneous  factor. 

However,  the  work  of  Kelley  (1972)  suggests  three  major  dimensions 
for  such  an  intuitive  examination.  If  the  evidence  does  not 
hold  up  over  time  or  across  relevant  situations,  or  if  it  is 
not  supported  by  the  opinions  of  other  relevant  actors,  leader 
phenomenon  may  plausibly  be  attributed  to  some  exogenous  source. 
In  our  study,  the  results  were  based  upon  leader  style  and 
situations  for  periods  ranging  up  to  eighteen  months,  across 
more  than  three  hundred  similar  levels  of  authority  governed 
by  the  same  standard  procedures  of  the  organization's  hierarchy. 

It  is  not  known  at  this  time  how  the  leaders,  themselves,  would 
support  the  conclusions  of  this  study.  However,  if  we  accept 
that  these  findings  accurately  reflect  the  relational  qualities 
of  leader  traits  and  styles  to  a  standard  set  of  problems  in 
the  same  situation,  these  results  suggest  that  policy  capturing 
is  a  conceptually  meaningful  methodology  to  better  explain  the 
process  of  leadership  in  specific  organizational  settings . 
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Affourtit,  Thomas  D.,  Interaction  Research  Institute,  Ir»c.,  Fairfax, 
Virginia.  (Wed.  A.M.) 


The  Leadership  Evaluation  and  Analysis  Program  (LEAP):  Validity  and 
Future  Directions 


'  The  LEAP  is  a  seif-applied  OD  method  that  allows  small  unit  com¬ 
manders  to  assess  leadership  concerns,  measure  unit  combat  readiness, 
and  evaluate  decision-making  effectiveness.  The  self-development 
strategy  assures  individual  command  control  and  confidentiality,  while 
central  analysis  is  made  possible  through  voluntary  and  anonymous 
submission  of  data. 

Production  rates,  reenli stme nt s ,  and  absenteeism  were  used  to 
validate  LEAP  Interaction  Inventory  scales  and  a  Disparity  Index 
(DI)  was  developed  as  an  additional  Dimension  of  command  climate  to 
measure  disunity  and  differential  treatment  of  unit  members.  The  DI, 
the  strangest  predictor  of  performance,  is  the  subject  of  planned 
research  in  working  relationships  using  the  Vertical  Linkage  Dyad  model 
of  analysis. 
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THE  LEADEF.SHI?  EVALUATION  AND 
ANALYSTS  PROGRAM  (LEAP) 


Validity  and  Future  Application 

Thomas  D.  Af four tit 
Interaction  Research  Institute,  Inc. 


The  Leadership  Evaluation  and  Analysis  Program,  commonly 
known  as  the  LEAP ,  was  originally  designed  for  the  U.S.  Marine 
Corps. 


o  The  LEAP  is  designed  for  usi-  by  the  small  uni 
commander  to  identify  leadership  concerns,  to 
measure  overa  IT  ’unit"  combat  readiness,  and  to 
evaluate  the  effectiveness  of  the  decision¬ 
making  process. 

e  The  LEAP  provides  company,  battery,  and  squadron 
level  commanders  with  a  completely  decentralized 
leadership  aid.  The  program  is  seif-applied ’ an3 
voluntary  on  the  leader’s  part.  And',  the  results 
are'str ictly  confidential  to  the  individual  command. 

c  Most  important,  through  the  decision-making  feed¬ 
back  principle,  the  LEAP  aids  the  .leader  in 
developing  the  tlexj-bility  to  deal  with  various 
groups  in  a  variety  of" situations  and  mission 
requirements 


The  T.EAP  is  basically  an  Ante.]  licence- gathering  process, 
founded  on  the  principles  of  modern  management  methods,  and 
utilizing  behavioral  science  techniques.  The;  entire  program 
is  presented  in  a  manual  that  features  a  step-by-step  procedure 
Cor  administering  the  technique;;  and  exp]ic.it  guidelines  for 
scoring  and  reviewing'  the  results.  Therefore,  no  outside  pro¬ 
fession'll  assistance  is  necessary  and  no  reports  or  formal 
papcrworn  is  required. 

Program  materials  allow  the  commander  to  systematically 
measure  unit  performance  in  terms  of  general  Marine  Corps 
standards  and  specific  unit  requirements.  In  addition,  unit 


motivation  is  measured  to  determine  the  reasons  behind  various 
levels  of  performance.  Since  performance  is  the  consequence  cf 
a  motivational  state,  che  LEAP  attempts  to  measure  the  causes 
as  well  as  the  effects  of  unit  combat  readiness. 

Once  the  causes  of  performance  levels  are  identified,  the 
leader  can  take  the  necessary  action  to  extinguish  those  condi¬ 
tions  that  produce  deficient  behavior,  and  reinforce  or  support 
the  conditions  that  promote  positive  performance. 

The  technique  for  measuring  unit  motivation  is  a  question¬ 
naire  that  functions  much  like  a  starlite  scope.  It  enables 
fhc  leader  to  see  the  hidden  causes  of  performance,  which  are 
not  easily  observed  under  normal  conditions.  In  this  way,  the 
LEAP  procedure  is  not  unlike  counterinsurgent  or  search-and- 
destroy  operations.  But,  in  this  case,  the  enemy  is  indif fer- 
ence ,  negligence,  discord,  and  prejudice.  These  are  some  of 
the  reasons  that  separate  effective,  combat-ready  units  from 
ineffective  ones. 

The  application  of  the  LEAP  is  very  simple,  requiring 
only  a  few  minutes  of  the  commander's  time  to  request  appli¬ 
cation,  to  review  the  results,  and  then  to  make  appropriate 
counteractive  decisions. 

First,  the  commander  has  the  unit  clerk  record  various 
performance  statistics  on  the  Leadership  Analysis  Form,  a 
behavioral  measuring  technique  and  an  essential  part  of  the 
program.  This  part  is  easily  accomplished  since  all  the  infor¬ 
mation  needed  is  available  from  unit  records.  It  takes  about 
an  hour  to  accomplish. 

Next,  the  CO  directs  a  responsible  subordinate  leader  to 
administer  the  Interaction  Inventory,  the  motivational  ques¬ 
tionnaire,  to  the  entire  command  to  measure  the  motivational 
level  of  several  relevant  areas  of  Marine  Corps  concern.  It 
takes  from  15  to  35  minutes  for  troops  to  complete  the  anony¬ 
mous  questionnaire.  Questionnaire  results  may  be  either  scored 
manually  or  computer  processed. 
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Finally,  the  CO  reviews  the  results  of  the  questionnaire- 
according  to  a  scheme  outlined  in  the  manual,  or  according  to 
his  or  her  own  interest.  These  results  serve  as  benchmarks  for 
the  command  which  are  usee  to  judge  progress  in  critical  leader¬ 
ship  areas. 

Now,  let's  take  a  closer  look  at  how  the  LEAP  works. 

The  CO  calculates  his  or  her  own  performance  profile,  a 
procedure  that  is  similar  to  developing  the  readiness  indica¬ 
tors  used  at  some  higher  level  commands,  except  that  unit 
leaders  select  their  own  areas  of  importance,  and  unit  measures 
are  taken  over  a  designal  ec-  period  of  time. 

The  command  motivational  profile  is  then  reviewed  to  deter¬ 
mine  the  areas  of  strength  and  weakness  in  terms  of  troop  per¬ 
ceptions.  Several  general  areas  are  measured,  such  as  command 
efficiency  and  cohesion ,  which  indicate  levels  of  unit  prepared- 
ness .  And,  conditions  of  discrimination,  justice,  and  intergroup 
climate  provide  an  assessment  of  overall  command  equality. 

In  addition,  two  adjunct  survey  instruments  have  been 
developed  to  expand  the  domain  of  the.  motivational  data  and  to 
provide  flexibility  of  application  through  selection  of  scales 
considered  most  relevant  at  the  user  unit  level.  Adjunct 
scales  include  measures  of  senior  proficiency,  senior  support, 
communication  flow,  organization  and  planning,  recognition, 
discipline,  task  satisfaction,  task  significance,  functional 
readiness,  solidarity,  and  individual  development. 

Like  an  aerial  reconnaissance  photo,  this  information 
can  be  amplified  to  expose  specific  issues  and  conditions  that 
make  up  each  scale  in  the  command  motivational  pro  ile.  Again, 
strengths  and  weaknesses  are  noted  as  peaks  or  depressions 
along  a  scale  graph.  Tactically  speaking,  this  information 
gives  the  coordinates  of  the  enemy's  position. 

These  data  also  allow  the  leader  to  establish  mission 
objecti/os  or  goals  to  reach  in  order  to  destroy  the  negative 
condition  and  improve  unit  military  status. 


Validity  studies  conducted  on  Marine  Corps  field  units 
show  that  as  these  scale  scores  move  toward  the  right,  indi¬ 
cating  more  positive  motivational  levels,  unauthorized  absences 
within  a  command  significantly  decrease,  and  first-term  reen¬ 
listment  rates  significantly  increase. 

The  motivational  information  can  also  be  analyzed  from 
the  standpoint  of  any  number  of  groups  within  the  command, 
such  as  senior  vs.  subordinate  Marines,  minority  vs.  majority 
Marines,  or  by  educational  level,  marital  status,  and  carder 
intention  groups,  just  to  name  a  few. 

It  has  been  found  that  where  disparity  exists  between 
groups  distinguished  by  rank,  such  as  sergeants  and  below  vs. 
staff  KCO's  and  above,  absenteeism  will  again  he  Hugh  and 
reenlistments  will  be  low.  But,  when  the  scores  cf  the  two 
groups  move  closer  together,  indicating  that  both  groups  judge 
the  command  as  functioning  at  the  same  level,  unauthorized 
absences  again  decrease  significantly  and  reenlistments 
increase  significantly.  The  same  outcome  was  discovered  for 
differences  in  perception  between  minority  and.  majority  member 
troops . 

In  review,  there  are  just  four  primary  steps  in  the  LEAP 
procedure : 

ft  First,  identify  leadership  problem  areas  or  locate 
enemy  positions  using  the  LEAP  tactical  sensing 
and  reconnaissance  devices  (analyze) . 

u  Second,  establish  leadership  goals  and  management 
objectives  (plan) . 

e  Third,  take  corrective  action  (attack) , 

t?  Fourth,  evaluate  results  in  terms  of  performance 
and/or  motivational  outcomes  (evaluate) . 

In  this  same  manner,  any  leadership  decision,  program  of 
training,  or  policy  designed  to  improve  performance  can  bo 
evaluated.  Such  decision-making  evaluation  assists  Marines  in 
producing  viable  solutions  to  some  of  the  contemporary  problems 
that  all  Marines  face  today. 
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In  support  of  this  sclutioivor ientcd  approach,  an 
information-processing  data  bank  has  boon  initiated.  The  LEAP 
Network  Monitor  System  is  a  process  that  will  enable  Marines 
to  share  solutions  to  common  problems  encountered  in  the  field. 
This  system  will  function  on  a  voluntary  and  an  anonymous  basis 
as  a  data  input  and  inquiry  bank  for  Marines  using  the  LEAP. 
Information  input,  recommendations,  and  solutions  discovered 
can  be  analyzed,  and  data  feedback  can  be  presented  in  con¬ 
sideration  of  any  number  of  influencing  conditions,  such  as 
unit  composition,  mission,  unit  status,  location,  or  effective 
strength.  Such  results,  based  on  actual  conditio’'.:;,  will  also 
be  beneficial  in  training  new  leaders  to  make  the  most  appro¬ 
priate  and  effective  decisions  prior  to  command  assignment . 

In  essence,  the  LEAP  supports  one  of  the  most  basic  and 
time-proven  leadership  principles  --  know  your  personnel!  It 
doesn't  matter  whether  your  leadership  style  is  authoritarian, 
participatory,  or  chari  sir,  a  tic;  you  can  benefit  from  a  better 
understanding  of  unit  personnel  and  more  systematic  knowledge 
of  how  members  react  in  various  situations. 

In  summary,  the  LEAP  is  a  voluntary  aid  for  company  level 
commanders.  And  the  payoff  for  just  a  little  time  and  effort 
invested  can  be: 

•  Greater  unit  harmony  and  morale. 

#  Performance  improvement. 

©  Less  disciplinary  effort  expended. 

©  Increased  control  and  influence  over  troops. 

«•  Leadership  skill  development. 

Validity  studies  conducted  over  a  five  year  period  revealed 
significant  relationships  between  various  LEAP  scale  scores  and 
unit  reenlis tment  and  unauthorized  absence  rates,  first  term  and 
career  reenlistment  intention,  drug  and  alcohol  abuse,  and  theft 
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within  participating  commands .  In  addition,  a  longitudinal  study 
of  avaition  maintenance  commands  showed  that  LEAP  motivational/ 
climate  measures  correlated  significantly  with  such  productivity 
indices  as  defect  rates,  parts  and  maintenance  backlog,  and  in¬ 
ventory  leadiness. 

One  of  the  major  outcomes  cx  the  LE.AP  studies  was  the 
development  of  the  disparity  index  (Dl)  as  a  measure  of  differ¬ 
ential  working  relationships  between  pertinent  groups  within 
commands.  Beyond  measuring  the  degree  of  unity  within  a  command 
over  critical  conditions,  the  DI  was  a  more  efficient  predictor 
of  climate  and  performance  for  some  of  the  LEAP  scales. 

A  proposed  future  effort  involves  using  the  LEAP  methodology 
to  investigate  the  impact  of  internal  role  relationships  upon 
unit  efficiency.  Intraunit  differentiation  as  measured  by  the 
DI  is  a  common  condition  that  has  a  significant  impact  upon 
group  performance.  While  traditional  attempts  to  study  leader¬ 
ship  fccus  on  average  style  and  treats  within  group  variance  as 
error  var;  ance,  the  DI  model  considers  intragroup  variance  as 
valid.  Any  serious  consideration  of  unit  cohesion,  adaptation, 
and  internalization  of  military  norms  must  take  into  account 
that  leaders  differ  over  time  and  within  groups  and  that  groups 
respond  differently. 

As  a  flexible  and  valid  organizational  development  interven¬ 
tion  and  research  tool,  the  I.EAP  provides  field  commanders  with 
a  procedure  for  improving  unit  combat  readiness,  and  offers 
scientists  a  unique  method  of  data  gathering  and  analysis  to 
study  the  complex  phenomena  called  leadership. 
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THE  JOB  TASK  ANALYSIS/SKILLS  AND  KNOWLEDGE  MARRIAGE  (PART  1) 


T.  M.  Ansbro  and  W.  A.  Hayes 

CNET/TAEG  NEPDIS  Development  Team,  CNET  HQ.  NAS  Pensacola 


As  job/task  analysis  methodology  continues  to  advance  In  sophistication, 
the  computer  takes  over  an  ever  greater  share  of  the  work  of  analysis,  leaving 
less  and  less  of  formerly  judgemental  areas  to  assumption.  But,  assumption 
still  functions  where  job  “skills  and  knowledges"  are  assigned  as  underlying  or 
component  to  tasks  in  Inventories. 


So  far.  In  front-end  analysis  of  the  workplace,  task  Interrelationships, 
rankings  by  complexity,  and  degrees  of  commonality  can  be  readily  determined  by 
the  computer.  If  task  00165  in  package  017  proves  common  to  16  others  in  an 
Inventory  of  2500,  has  a  complexity  index  of  1.25,  and  embodiea  all  the  subor¬ 
dinate  work  behaviors  of  137  other  tasks,  the  computer  can  record  these  fea¬ 
tures  and  position  the  subject  task  appropriately  in  an  output  hierarchy.  It 
can  also  sort  on  the  basis  of  identifying  or  descriptive  data  Included  In  the 
task  record  in  the  inventory.  Such  processing  gets  pretty  far  down  Into  speci¬ 
fics  of  work  behaviors  underlying  tasks,  but  it  doesn't  affix  Identified  mani¬ 
pulative  or  processing  skills  and  static-descriptive  or  process-associated 
knowledge  (Information)  elements  to  those  tasks. 

This  paper  (Part  I)  describes  a  matrix  of  "skills  and  knowledge"  elements 
to  augment  a  model  front-end  job/ task  analysis  subsystem  (NEPDIS — Naval  Enlis¬ 
ted  Professional  Development  Information  System)  and  discusses  such  alterna¬ 
tives  as  adding  these  data  to  the  master  job/task  Inventory  or  providing  an 
ancillary  "skills  and  knowledge"  inventory  for  use  of  the  training  program 
developer. 
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As  "front-end"  job/ task  analysis  methodologies  have  progressed  and 
continued  to  advance  in  sophistication,  the  computer  takes  over  an  even  greater 
share  of  the  work  of  analysis,  leaving  less  and  less  of  formerly  judgemental 
areas  Co  panel-of-experts  analysis,  evaluation,  and  cataloguing.  Identifi¬ 
cation  and  description  of  tasks  in  inventories  appear  fairly  concrete,  as  they 
do  for  such  task  component  work  behaviors  as  task  elements  (Johnson  and 
Richmann,  1975).  What  analysis  is  passible  so  far  with  the  task  (and  Its 
accompanying  descriptive  factors)  as  the  sole  data  source  has  yielded  task  com¬ 
plexity  indexes,  hierarchies  expressive  of  task  interrelationships,  and  task 
commonality  Indicators  within  and  among  occupational  fields.  With  such  outputs 
of  job/ task  analysis  producible  by  computer  programming,  assumption  and 
subject-matter  expert  (SHE)  consensus  might  well  be  expected  to  fade  Into  the 
background.  However,  assumption  still  functions  where  such  work  behaviors  as 
job  "skills  and  knowledges"  are  or  must  be  assigned  as  underlying  or  component 
to  tasks  in  Inventories. 

In  Navy  manpower  management,  ship  and  squadron  manning  documents  and  job 
(billet)  descriptions  are  dependent  in  the  main  upon  extensive,  detailed,  and 
comprehensive  Inventories  of  operational,  maintenance,  administrative,  military 
watchstanding,  and  other  tasks  (as  well  as  ship,  systems,  and  equipment  data). 
Personnel  distribution,  rating  assignment,  advancement,  and  training  (espe¬ 
cially  training)  depend  upon  inventories  of  skills  as  well  as  tasks;  and  the 
training  community  needs  to  take  the  job/task  inventory  "audit  trail"  down 
farther  still — to  the  level  where  "job  knowledge"  can  be  associated  directly 
with  "job  skills"  to  support  job  tasks. 

This  paper  (Part  I)  describes  an  attempt  to  produce  a  matrix  of  "job  skills 
and  knowledges"  elements  to  augment  model  front-end  job/ task  analysis  subsystem 
currently  under  development  by  the  sCaff  of  the  Chief  of  Naval  Education  and 
Training  (CNET)  and  Training  Analysis  and  Evaluation  Group  (TEAG)  elements  in 
Pensacola,  Florida.  The  subsystem  model  is  the  Training  Analysis  Subsystem  of 
the  Naval  Enlisted  Professional  Development  Information  System  (NEPDIS)  (Davis, 
1976,  1977,  1977a,  1977b). 

The  NEPDIS  model  currently  stores  some  23,000  Naval  Avionics  rating  tasks 
in  its  inentory.  Occupational  data  acquisition  was  accomplished  for  other 
ratings  in  the  Nava]  aviation  community,  ten  in  all,  but  these  data  are  not  yet 
in  the  computer.  As  a  result,  occupational  data  stored  and  analysed  to  dace  by 
NEPDIS  remain  in  the  major  functional  category  of  Maintenance.  A  typical 
listing  of  an  avionics  maintenance  task  is  shown  in  Figure  1. 

With  computerized  analysis  of  such  data  as  typified  by  this  entry  in  the 
Aviation  Electrician's  Mate  (AE)  job/lask  Inventory  under  VEPDtS,  tasks  may  be 
ranked  by  complexity;  and  degrees  of  commonality  (from  the  identical  to  an 
agreed-upon  level  of  similarity)  (Davis  and  Perry,  1980)  may  be  established. 
Task  interrelationships  may  also  be  established.  Some  tasks  obviously  contain 
many  component  work  behaviors  that  are  also  contained  in  other  tasks  of  lesser 
complexity  and  scope;  some  tasks  duplicate  the  work  behaviors  of  others,  some¬ 
times  regardless  of  the  tasx  titles  involved.  Tasks  shown  to  “embody"  otl  3r 
(subordinate  cr  component)  tasks  are  termed  "Omnibus”  tasks;  the  tasks  shown  to 
be  component  to  the  Omnibus  tasks  are  termed  “Embodied"  tasks  (Figure  2).  In 
the  NEPDIS  jot/task  Inventory  (JT1)  AE  task  00165  In  package  017  may  be  shown 
to  have  a  complexity  index  of  1.25.  prove  common  to  28  others  in  a  one-rating 
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inventory  r>f  3700  tasks,  and  embody  all  the  component  work  behaviors  of  137 
other  tasks.  The  computer  can  record  these  features  after  producing  them  via 
analysis,  and  It  can  position  the  subject  task  appropriately  in  any  specified 
output  hierarchy.  It  can  also  sort  on  the  basis  of  identifying  or  descriptive 
data  included  In  the  task  record  in  the  Inventory  (Ansbro,  1978).  Such 
processing  gets  pretty  far  down  Into  the  specifics  of  work  behaviors  underlying 
tasks,  but  it  doesn’t  extend  beyond  identifying  job-related  skills.  Figure  3 
shows  task  "signature  block”  (work-behavior  descriptive  data)  printed  out.  The 
five  skill  areas  included  in  the  task  signature  block  contain  statements  of 
work  behavior  that  would  appear  to  be  as  descriptive  of  elements  (or 
components)  of  a  task  as  of  the  skills  that  they  represent.  They  are 
definitive,  small  in  compass,  and  specific  co  (and  therefore  underly)  task 
performance;  therefore  attached  to  other  descriptive  data  for  the  task. 

Herein  lies  a  problem  for  the  training  program  developer  or  curriculum 
designer.  To  design  a  training  course,  he  needs  an  inventory  of  tasks  to 
describe  course  graduate  job  performance  capabilities  and  to  provide  realistic 
practical  exercises  and  performance  tests.  Successful  student /graduate  task 
performance  when  matched  with  on- job  (billet)  requirements  In  the  fleet  (also 
tasks)  serves  as  a  reasonable  predictor  of  successful  performance  on  the  job. 

However,  can  a  course  cover  all  the  tasks  that  the  graduate  must  perform  on 
the  Job  in  his  fleet  assignment?  The  best  that  we  can  hope  for  is  coverage  of 
theee  tasks  that  best  represent  fleet  requirements.  The  analysis  that  results 
from  this  realisation  requires  intricate  grouping  and  cataloguing  of  work 
behaviors.  Selection  of  representative  tasks  really  involves  selection  of 
those  underlying  behaviors  component  to  or  most  widely  transferable  among  tasks 
assumed  to  be  representative  of  fleet  job  requirements. 

The  transferable  component /supporting  work  behaviors  underlying  task 
performance  are  skills.  Skills,  stated  in  behavioral  action  language,  resemble 
tasks.  Indeed,  the  workplace  and  the  schooltacuse  both  use  task  and  skill  ter¬ 
minology  almost  lntercnangeably.  As  an  example,  welding  is  described  as  a 
transferable  skill,  since  welding  something  to  something  else  is  component  to 
performing  many  metal  fabrication  and  repair  tasks.  However,  depending  upon 
how  a  work-behavior  statement  reads  ("weld  fire-hose  support  bracket  to  bulk¬ 
head"),  welding  may  be  regarded  as  the  action  part  of  a  task  statement. 

Welding  as  a  major  work  bebavior  can  also  be  regarded  as  pore  encompassing  than 
a  task;  it  can  represent  an  entire  worker  career,  or  the  sole  mission  or  output 
of  a  shop  or  department.  Soldering,  somewhat  similar  technically,  but  a  consi¬ 
derably  smaller  skill,  is  usually  termed  just  that— a  skill.  Viewed  from  a 
task-descriptive  orientation,  it  is  also  a  task  element.  But,  because  of  its 
simplicity,  subord^nate/coaponent  nature,  and  wide  applicability  (therefore 
transferability)  to  task  performance,  it  is  generally  considered  to  be  a  skill, 
and  in  the  occupational  field(s)  of  electricity/electronics,  a  basic  one,  at 
that. 

On  che  premise  that  if  rhe  schoolhouse  is  to  train  the  graduate  to  perform 
on  the  job  in  the  fleet,  the  instructional  program/ course  designer  must  attempt 
to  replicate  the  best  representative  and  most  critical  tasks  from  that  target 
environment;  then,  he  must  ferret  out,  verify  and  train  on  those  skills  found 
to  be  component  to  those  tasks  selected  for  training  and  transferable  to  those 
known  to  exist  in  the  workplace  but  not  selected  for  training.  Therefore,  the 


training  community  nerds  extensive  and  comprehensive  JTIs  vrith  task-descriptive 
data  as  fully  fleshed  out  as  possible;  and  it  Is  certainly  more  than  merely 
convenient  to  have  skills  appropriately  identified  as  such.  Further,  In  any 
cart.1  continuum.  It  must  follow  that  In  the  earlier  training  programs  (basic, 
an  rentice,  initial  job-entr>),  the  concentration  of  training  effort  is  on 
skills,  transferable  to  the  job  environment  where  they  may  be  applied  and 
refined  in  a  work-and-OJT  setting.  Advanced  specialized  training  still  teaches 
skills,  but  task  performance  figures  somewhat  more  prominently,  as  higher  level 
technician  training  more  closely  approximates  the  real-world  environment  of  the 
graduate's  ultimate  work  site. 

As  mentioned  earlier,  skill  statements  resemble  task  (and  task  element) 
statements:  There  are  action  verbs,  objects  of  action,  and  job-environment 
conditions  and  work-performance  standards.  It  is  necessary  to  make  one  clear 
distinction  if  there  is  to  be  any  observable  difference  between  these  state¬ 
ments  (in  an  inventory).  In  the  task  statement,  the  object  of  the  action  is 
specific:  a  clearly  identified  or  coded  item  (system  or  subsystem  component, 
equipment  item,  part,  form,  machine,  instrument,  etc.)  In  the  skill  statement, 
the  object  of  the  action  is  not  a  specific  item;  it  can  be  typical  of  a  group 
or  class,  a  "generic*  item  (mild  steel  plate,  galvanized  sheet  metal  ducting, 
bar  stock,  tubing,  circuit  wiring,  solid  state  printed  circuits,  etc.),  even  a 
synthesized  or  composite  item  generated  for  such  a  purpose  as  training  or  prac¬ 
tice  of  an  identified  skill  action.  As  tasks  fall  into  hierarchies,  so  also  do 
skills.  A  "troubleshooting"  task  (In  NEPD1S:  "Isolate  Fault/Troubleshoot 

- object")  employs  subordinate  (component)  "troubleshooting”  skills: 

selecting/using  references,  select lng/using  tools,  selectlng/uslng  support 
materials,  selectlng/uslng  support  equipment,  and  selectlng/uslng  test  equip¬ 
ment. 

The  principal  mechanistic  reason  for  making  these  task/skill  distinctions 
in  the  NEPDIS  Training  Analysis  subsystem  is  the  need  for  the  computer  to 
recognize  the  distinctions  In  its  progress  through  analysis  Coward  such  outputs 
as  billet-specific  task  Inventories  and  rating-specific  skill  inventories. 
NEPDIS  front-end  analysis  was  designed  to  be  totally  computer-served  and  to 
conduct  all  job/task/skill  analysis  for  designated  users  at  the  "front-end"; 
hence,  the  emphasis  on  coding,  detailed  Identification  and  descriptive  factors, 
and  other  aspects  of  an  audit  trail  from  task  identification  through  reference 
source. 

Job  knowledge,  or  the  task/skill  performance  enabling  hase  (information/ 
data);  lies  at  the  bottom  of  the  audit  trail.  Information  from  all  reference 
sources  pertinent  to  task/skill  performance  can  be  assumed  to  fit  into  a  rela¬ 
tively  simple  matrix,  an  example  of  which  is  shown  in  Figure  4.  A  NEPDIS- 
conducted  literature  search  based  on  reference  text  support  of  occupational 
data  already  in  the  JTI  indicated  substantial  reference  support  for  the  details 
of  task  and  task  element  performance.  However,  little  test  support  for  those 
behaviors  identified  as  component  skills  was  found  in  these  references.  For 
instance,  what  to  solder  and  at  what  point  to  solder,  and  what  tools  and 
materials  applied  to  the  task  element  were  amply  covered  by  reference.  How  to 
solder  was  not.  Hence,  in  a  Navy-designed  front-end  job/task  analysis  sub¬ 
system  built  to  support  Instructional  Systems  Development  (ISD),  the  subsystem 
developers  discovered  that  in  some  instances  they  had  provided  themselves  with 
relatively  light  direct  reference  text  support  for  developing  the  essentials  of 
some  skills  training.  By  tracking  back  through  the  appropriate  job 


task-oriented  reference*  and  by  recourse  to  rating-specific  texts  and  existing 
skills-training  school  texts,  the  necessary  reference  support  can  be  found. 

But  it  Is  not  direct,  and  It  is  not  totally  and  specifically  contained  within 
the  master  JTI  for  the  rating  or  occupational  field. 

JJEPDIS  front-end  analysis  had  been  designed  with  the  avowed  intent 
of  keeping  all  the  job/task  (and  skill)  analysis  at  the  front  end. 

An  instructional  systems  developer  was  to  receive  all  the  various 
inventories  (subsystem  outputs)  needed  to  develop  curricula/instruc¬ 
tional  programs,  etc.  without  having  to  "go  back  to  the  front  end” 
himself  to  analyze  or  further  analyze  data;  especially,  he  should  not 
have  to  collect  raw  data. 

One  simple  solution  Is  to  add  such  text  references  in  the  appropriate  spot 
in  che  task  identification  block  in  the  master  JTI.  Another  is  to  provide  a 
brief  structured  addendum  to  che  master  JTI,  this  item  strictly  for  the  use  of 
training  program  development  personnel.  Figure  5  illustrates  the  general  scope 
of  basic  supporting  Information  useful  to  the  developer  of  skills  training.  In 
essence,  this  example  would  suggest  Che  beginning  of  an  adjunct  taak/skills 
performance-supporting  Information  inventory  or  skills  and  knowledge  library. 

A  third  alternative  is  to  construct  a  matrix  such  a a  the  one  shown  (In 
concept)  in  Figure  4,  and  code  it  to  the  task  signature  block  in  the  JTI.  The 
matrix  generally  illustrates  che  task  support  hierarchy:  from  the  top  down, 
task  performance  Is  supported  by  any  number  of  task  elements;  the  task  elements 
are  supported  by  (and  Incorporate)  manipulative  and/or  Information-processing 
skills;  and  these  skills  are  supported  by  static  descriptive  and  process- 
associated  knowledge  Items  (the  anabling  information  base).  For  practical 
Incorporation  into  the  JTI,  identified  skills  can  be  cross-coded  to  task  iden¬ 
tification  codes,  and  bodies  of  information  identified  as  skill-supporting 
items. 

The  alternatives  mentioned  above  represent  current  NEPDIS  effort  to  marry 
the  already  definitive  job  task  information  In  the  JTIs  with  equally  definitive 
supporting  skill  and  knowledge  data.  The  intent  also  is  to  maintain  a  visible 
and  easily  followed  audit  trail  throughout  the  Training  Analysis  Subsystem. 
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FIGURE  1.  MEPOIS  TASK  LISTING:  AVIONICS;  TASK  IN 
AVIATION  ELECTRICIAN’S  MATE  JOB/TASK 
INVENTORY 


FIGURE  2.  OMNIBUS/EMBODIED  TASK  RELATIONSHIP 

(AVIATION  ELECTRICIAN'S  MATE  INVENTORY) 
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FIGURE  5.  SKILL-SUPPORTING  INFORMATION  PACKAGE 
FOR  THE  TRAINING  PROGRAM  DEVELOPER 
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FIGURE  5  (CONTINUED).  SKILL-SUPPORTING  INFORMATION  PACKAGE 

FOR  THE  TRAINING  PROGRAM  DEVELOPER 


THE  JOB/TASK  ANALYSIS/SKILLS  AND  KNOWLEDGE  HARR I AGE  (PART  II) 
T.  V.  Ansbro  and  W.  A.  Hayes 

CNET/TAEG  NEPDIS  Development  Team,  CNET  HQ.  NAS,  Pensacola 


This  paper  (Part  II)  illustrates  input  and  eventual  emoloyment  of  model 
“tasks,  skills  and  knowledges*  ^tn  the  front-end  job/task  analysis  subsystem 
of  NEPDIS  (Naval  Enlisted  Professional  Development  Information  System).  The 
four-part  matrix  displayed  in  Part  I  reflects  a  hierarchy  with  the  task  at 
the  too  and  the  associated  knowledge  elements  at  the  bottom.  The  task-to- 
task  element-to-comoonent  ski 1 1 -to -component  knowledge  continuum  nrovides  an 
audit  trail  for  the  use  of  the  training  program  or  curriculum  develooer, 
hether  the  matrix  is  added  to  the  existing  master  job/task  Inventory  or 
rovided  as  an  ancillary  data  bank  for  specialized  use. 
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od 
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The  principal  goal  is  to  set  up  a  functioning  audit  trail  (to  justify  a 
of  job-related  technical  information  as  actually  component  to  or  clearly 
nderlying  task  performance).  A  secondary  goal  is  to  set  up  an  occuDational- 
ield  data  bank  and  computerized  retrieval  methodology  to  suonort  this  aim. 
he  outputs  of  front-end  job/task/skill  analysis  can  then  be  used  both  to 
'describe  (even  construct)  jobs/bi llets  and  the  tasks  performed  by  their  in¬ 
cumbents,  and  to  describe  the  skills  and  knowledge  requirements  for  job 
incumbency,  certification,  advancement,  and  associated  training.  n 
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This  paper  (Part  II)  illustrates  the  input  and  eventual  employment  of  model 
"tasks,  skills  and  knowledges"  in  the  front-end  job/task  analysis  subsystem  of 
NEPDIS  (Naval  Enlisted  Professional  Development  Information  System).  The 
principal  goal  is  to  justify  a  body  of  job-related  technical  information  as 
actually  component  to  or  clearly  underlying  task  performance;  a  secondary  goal 
is  to  set  up  an  occupational-field  data  bank  and  computerized  retrieval  method¬ 
ology  to  support  this  aim. 

The  initial  input  into  the  model  (Figure  1)  is  a  comprehensive  inventory  of 
job/task  statements  consisting  of  specific  actions  to  be  performed  on  (or  with) 
actual,  real-world  (not  generic)  task  objects  with  conditions,  standards  and 
supporting  descriptive  factors.  It  cannot  be  over-emphasized  that  until  all 
the  above  data  as  a  minimum  have  been  collected,  no  total  and  thorough  analysis 
can  be  completed.  The  term  "front-end  analysis"  as  used  in  this  paper  is  in¬ 
tended  to  portray  an  analysis  of  all  data  collected  at  the  front  end  of  any 
particular  occupational  data  acquisition  and  analysis  project.  It  is  not  ap¬ 
plicable  to  any  system  which  collects  only  a  portion  of  the  reauired  data,  con¬ 
ducts  a  portion  of  the  analysis,  and  then  returns  to  the  same  (or  additional) 
sources  for  another  round  of  data  collection. 

Once  the  data  have  been  collected,  a  computerized  analysis  can  sort  tasks 
into  skill  levels,  pay  grades,  or  any  other  desired  distribution  based  on  the 
inherent  complexity  of  the  supporting  descriptive  factors  (Figure  2).  This 
first  computer  sort  produces  a  crucial  product  by  selecting  from  the  total  in¬ 
ventory  those  tasks  most  appropriate  for  assignment  to  a  particular  group  of 
job  incumbents  (Figure  3).  The  initial  list  of  selected  tasks  then  serves  as 
an  input  into  other  computer  algorithms  which  produce  lists  of  billet-specific 
tasks  and  a  list  of  rating-specific  skills  (Figure  4).  The  skills  are  then 
further  analyzed  to  determine  knowledge  requirements. 

The  billet-specific  tasks  can  serve  useful  roles  in  determination  of  man¬ 
power  requirements,  assignment  of  personnel,  identification  of  specific  train¬ 
ing  requirements,  advancement  in  rate,  and  certification  and  evaluation  of 
workers'  performance  on  the  job.  Rating-specific  skills  serve  as  the  standard 
to  which  all  members  of  a  rating  (regardless  of  billet  to  which  assigned)  must 
be  able  to  function  in  order  to  perform  the  tasks  of  a  specific  pay  grade, 
skill  level,  cr  other  distribution  structure.  Knowledge  requirements  serve  to 
support  absolute  standards  for  advancement  in  rating  and  for  certification  in 
work  qual if i cations . 

Once  skills  and  knowledges  have  been  identified  and  extracted  from  the 
tasks,  the  selection  of  training  sites  and  methods  becomes  the  area  of  primary 
concern  (Figure  5).  Skills  and  knowledges  can  be  prioritized  for  training; 
and  those  with  highest  priority  would  be  assigned  to  a  schcolhouse  setting 
(for  instance  a  class  "A"  school).  The  remainder  would  appropriately  be 
assigned  to  correspondence  courses  and  other  self-study  modes.  In  most  cases 
the  training  manuals  used  as  the  basic  text  for  schcolhouse  courses  would  also 
be  used  for  follow-on  self-study  and  correspondence  courses. 

Tasks,  like  skills  and  knowledges,  can  be  prioritized  for  training.  Those 
with  high  priority  would  be  assigned  to  schoolhouse  settings  (for  instance, 
class  "C"  and  class  "F"  schools),  while  those  of  low  priority  would  be  assign¬ 
ed  to  formal  on-the-job  trailing. 
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A  total  training  continuum  based  on  the  above  assignment  to  the  various 
training  settings  would  ensure  that  a  worker  reporting  to  a  new  assignment 
would  have  completed  training  for  the  ski11  and  knowledge  requirements  es¬ 
tablished  as  necessary  to  perform  in  his  particular  pay  grade.  It  would 
also  ensure  that  he  had  been  trained  to  perform  tasks  peculiar  to  his  parti¬ 
cular  billet.  He  would  report  to  his  work  site  already  prepared  to  perform 
useful  work  and  after  a  brief  period  of  formal  on-the-job  training,  would  be 
fully  billet-qualified  and  able  to  perform  all  tasks  assigned  his  billet. 

Job  certification  and  advancement  are  other  areas  which  would  also  be 
served  by  an  automated  analysis  of  o:cupational  data  (Figure  6).  A  task- 
specific  billet  description  for  a  worker’s  next  billet  and  the  three  ele¬ 
ments  of  training--tasKS,  skills,  and  knowledges  are  the  input  items  for 
the  algorithm.  A  worker's  ability  tc  perform  tasks  and  skills  would  be 
evaluated  by  hands-on  performance  tests  and  his  possession  of  adequate  know¬ 
ledge  requirements  would  be  evaluated  by  paper  and  pencil  tests.  Once  he 
had  demonstrated  proficiency  in  the  three  elements  required  for  a  new  billet, 
he  would  become  eligible  for  advancement.  He  could  then  advance  into  the  new 
billet  and  begin  functioning  as  a  fully  qualified  and  certified  incumbent; 
upon  meeting  other  requirements  such  as  military  leadership,  time-in-rate,  etc. 
he  could  automatical ly  be  advanced  to  the  next  pay  grade  commensurate  with  the 
billet.  The  Service  as  well  as  the  individual  would  receive  fair  and  equit¬ 
able  compensation. 
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RATIONALIZATION  Of  JTi  DISTRIBUTION  INTO  Skill  lEyEL/PAY  GRADE  GROUPS 


EXIT  trainee 

Enter  apprentice 


EXIT  APPRENTICE 

Enter  journeyman 


exit  JOURUtYKAN 

ENTER  ADVANCED  JOURNEYMAN 
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DEVELOPMENT  OF  THE  GENERAL  WORK  INVENTORY 


Rodger  D.  Ballentine,  Maj 
Air  Force  Institute  of  Technology 

J.  W.  Cunningham,  Professor  of  Psychology 
North  Carolina  State  University 


*  The  services  have  a  sophisticated  task-based  job  analysis  system  which 
provides  invaluable  training  and  job  descriptive  information  within  occupa¬ 
tional  areas.  Increasingly,  however,  requests  are  made  for  the  comparison 
of  related  work  functions  or  variables  (e.g,,  skills)  across  occupational  areas. 
A  practical  quantitative  measure  for  comparing  work  activities  in  different 
career  fields  would  facilitate  broad  description,  comparison,  and  classifi¬ 
cation  of  such  occupational  information.  This  paper  outlines  a  study  to 
develop  a  structured  questionnaire  consisting  of  work  activity/condition 
descriptors  applicable  to  the  entire  occupational  spectrum.  In  addition  to 
outlining  the  potential  applications  of  such  a  system,  we  will  describe  plans 
to  rate  a  sample  of  enlisted  Air  Force  jobs,  assess  instrument  reliability 
and  validity,  and  group  jobs  based  on  their  work  dimension  profiles., 

/\ 

INTRODUCTION  TO  WORK  STUDY  ! 

Historically,  the  idea  of  man's  work  has  religious,  ethical,  social, 
and  economic  implications.  Work  is  either  a  burdensome  necessity,  a  means 
to  an  end,  or  a  creative  and  valued  act  of  man  (Prien  &  Reman,  1971)  .  In 
this  discussion,  work  is  viewed  as  a  process  whereby  one  exerts  effort  to 
transform  various  inputs  into  prescribed  outcomes.  Since  the  turn  of  the 
century,  job  analysis  has  been  synonymous  with  work  study,  as  explained  by 
McCormick  and  Tiffin  (1974):  "Job  analysis  can  be  considered  as  embracing 
the  collection  and  analysis  of  any  type  of  job-related  Information,  by  any 
method,  for  any  purpose.  Perhaps  it  can  be  defined  more  generally  as  the 
study  of  human  work"  (p.  49) . 

The  field  of  job  analysis  includes  both  qualitative  (conventional)  and 
quantitative  (structured)  methods  for  collecting  worker  attribute  and  work 
performance  information.  The  conventional  approach  is  useful  for  describing 
specific  jobs;  hwever,  such  infornation  is  not  general izable.  On  the  other 
hand,  the  structured  approach  provides  a  consistent  framework  for  data  col¬ 
lection  and  therefore  the  ability  to  compare  and  numerically  classify  the 
units  of  analysis.  Frequently  confusion  in  this  area  of  investigation  re¬ 
sults  from  inconsistently  defined  and  applied  terminology.  In  an  effort  to 
reduce  further  confusion  and  assist  the  reader,  several  terms  pertinent  to 
this  discussion  are  defined  below. 

A  work  element  is  a  description  of  various  kinds  of  work  activities 
or  conditions  on  which  positions  or  jobs  can  be  rated.  Typically, 
such  elements  are  general  enough  to  be  applicable  to  a  wide  variety 
of  jobs. 

A  work  dimension  is  a  statistically  derived  construct  representing 
work  elements  which  cccrmonly  occur  together  in  positions/ jobs . 

A  job  is  a  group  of  positions  in  which  major  tasks  are  similar 
enough  to  justify  a  single  analysis  within  or  across  organizations. 

A  job  f ami .ly /cluster  is  a  group  of  jobs  or  occupations  which  have 
carmen  characteristics. 
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The  quantitative  approach  promotes  more  objective  and  systematic  in¬ 
vestigation  of  the  relationship  between  work  units.  Ccnmonly  tasks,  duties, 
and  work  elements/dimensions  are  used  to  analyze  positions,  jobs,  or  oc¬ 
cupations.  Molecular  analysis  of  task  and  duty  similarity  between  positions 
or  jobs  is  typically  done  within  the  sare  organization,  whereas  more  molar 
analysis  of  work  element/dimension  similarity  between  jobs  and  occupations 
is  typically  done  across  organizations.  The  relationship  between  these  units 
of  job  analysis  form  a  hierarchy  for  the  study  of  work  (Pearlman,  1980) . 
Individuals  performing  tasks  represent  positions,  positions  containing  simi¬ 
lar  tasks  are  grouped  to  form  jobs,  and  jobs  with  common  work  elements/ 
dimensions  are  grouped  to  form  job  families.  A  pictorial  representation  of 
these  levels  of  work  and  descriptive  measures,  adapted  from  Pearlman  (1980), 
is  presented  in  Figure  1. 

QUANTITATIVE  WORK  TAXONOMIC  RESEARCH 

Prien  and  Ronan  (1971)  drew  two  conclusions  from  a  review  of  job  ana¬ 
lytic  research.  First,  they  found  chat  few  studies  provide  basic  and 
generally  useful  information  about  work,  workers,  and  occupations.  Second, 
data  from  various  job  taxonomic  efforts  were  not  cctrparable  because  of  pro¬ 
cedural  and  measurement  differences.  Jones  and  DeCoths  (1969)  conducted  a 
national  survey  of  firms  about  their  use  of  job  analysis  and  attributed 
management  dissatisfaction  with  these  programs  to  a  lack  of  standard  quan¬ 
titative  techniques  for  gathering,  recording,  and  presenting  job  information. 
Pearlman  (I960)  echoed  the  importance  of  systematic  job  analytic  procedures 
in  the  development  of  a  taxonomy  of  work  performance: 

The  development  of  strategies  for  classifying  and  grouping 
jobs  in  some  systematic  fashion  thus  appears  to  be  an  es¬ 
sential  step  in  the  effort  to  devise  a  unified  taxonomy  of 
work  performance,  that  is,  one  that  addresses  the  relevant 
characteristics  of  both  people  and  jobs,  (p.3) 

From  both  the  theoretical  and  applied  standpoint,  standard  quantitative  work 
measurement  procedures  are  important  in  the  study  of  work. 

Systematic  procedures  for  studying  jobs  are  available  from  quantitative 
job  analytic  research.  These  methods  most  often  employ  structured  job 
questionnaires  to  gather  information  about  the  work  accomplished  in  jobs 
within  specified  occupational  areas.  Probably  the  best  example  of  this 
methodology  is  the  task-inventory  approach  developed  by  the  USAF  and  other 
services  (Christal,  1974).  Overall,  the  structured  job  analysis  question¬ 
naire  has  proven  to  be  an  economical  and  reliable  tool  for  collecting  job 
information.  Taxonomic  research  to  define  cannon  work  denominators  suggest 
new  uses  for  this  data  collection  technology. 

Paralleling  these  inprovements  in  data  collection  are  conceptual  ad¬ 
vances  promoting  theoretical  and  practical  interest  in  work  taxonomic  research 
(Cunningham,  1974;  McCormick,  1979;  McK inlay,  1976;  Pearlman,  1980). 

Cunningham  (1974)  called  the  required  technology  for  taxonanic  definition 
and  measurement  of  work  "Ergcmetrics , "  and  defined  it  as  "the  application  of 
psychometric  principles  and  procedures  to  the  study  of  human  work"  (p.  7) . 

Both  McCormick  and  Cunningham  have  identified  work  activity/condition 
descriptors  to  serve  as  camon  denominators  in  job  description  and  classifi¬ 
cation.  These  descriptive  statements,  applicable  to  jobs  throughout  the 
occupational  spectrum,  are  promising  variables  for  a  comprehensive  work 
taxonomic  system.  Moreover,  job  e laments  can  be  linked  to  defined  human 
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Figure  1.  An  example  of  work  analysis  within  an  organization.  Adapted  from  Pearlman 

(1980). 


attributes  for  which  there  are  tests  (McCormick,  Jeanne ret,  &  Mecham,  1972; 
Pass  &  Cunningham,  1975) ,  and  can  be  used  to  cluster  jobs  with  similar 
activity  or  attribute  requirements  (Pass  &  Cunningham,  1978;  Shaw,  DeNisi, 

&  McCormick,  1977) . 

It  is  our  thesis  that  the  required  tools  for  a  ccrnprehens i ve  work 
description  and  classification  system  are  psychcmetrically  based  job  activity 
questionnaires  (Cunningham,  1974) .  Two  such  questionnaires  have  been  con¬ 
structed  with  consideration  of  the  "job-oriented"  and  "worker-oriented" 
dichotomy  in  work  activity  descriptors  preposed  by  McCormick  (McCormick, 
Cunningham  &  Gordon,  1967) .  The  Position  Analysis  Questionnaire  (PAQ)  is 
primarily  made  up  of  worker-oriented  activity  statements  (McCormick  et  al . , 
1972) ,  while  the  Occupation  Analysis  Inventory  (QAI)  is  both  job-  and 
worker-oriented  (Cunningham,  Tuttle,  Floyd,  &  Bates,  1974) .  Both  instruments 
portray  work  through  an  information-processing  model  with  similar  item  cate¬ 
gories.  These  inventories  are  representative  of  the  state  of  the  art  in 
collection  of  quantified  job  activity  information,  and  their  descriptive 
taxonomies  reflect  job-  and  worker-oriented  activities  ccrrmon  to  the  world 
of  work.  However,  widespread  data  collection  with  both  of  these  instruments 
is  sanewhat  limited  by  their  complexity  and  the  demands  they  place  on  the 
rater. 


PROPOSED  STUDY 

Existing  job  activity  inventories  are  often  impractical  because  trained 
personnel  or  highly  educated  raters  are  required  to  collect  job  data  with 
them.  This  study  has  two  primary  goals.  First,  we  plan  to  develop  a  gen¬ 
erally  applicable  structured  work  questionnaire  which  is  simple  enough  to 
be  completed  by  job  incumbents.  This  instrument  should  be  brief,  clear  to 
the  typical  job  holder,  and  present  a  straightforward  job  rating  task. 
Secondly,  we  hope  to  demonstrate  hew  the  questionnaire  can  be  used  to  define 
activity/condition  dimensions  of  jobs  and  how  these  dimensions  can  be  used 
to  meaningfully  classify  jobs.  A  practical  questionnaire  would  not  only 
facilitate  the  collection  of  quantitative  information  about  jobs,  but  would 
also  promote  the  use  of  a  cannon  language  between  employer  and  employee.  A 
description  and  classification  system  based  on  such  an  instrument  could  find 
use  in  both  personnel  management  and  employment,  counseling. 

In  the  proposed  study,  we  plan  to  apply  the  previous  QAI  research  to 
the  following  objectives: 

a.  Develop  a  set  of  work  descriptors  based  on  factor  analyses  of  the 
original  QAI  items  by  Boese  and  Cunningham  (1976). 

b.  Incorporate  the  resulting  work  variables  into  a  structured  question¬ 
naire  which  can  be  administered  to  job  incumbents.  This  instrument  will  be 
referred  to  as  the  Ceneral  Work  Inventory  (GWI) . 

c.  Apply  the  GWI  to  a  sample  of  Air  Force  jobs  performed  by  skilled 
personnel.  Different  rater  groups  will  analyze  the  same  and  different  jobs. 
The  resulting  ratings  will  be  used  to  determine  item  reliabilities  within 
groups  and  rater  agreement  across  groups. 

d.  Determine  the  factor  structure  of  the  GWI  elements  and  investigate 
the  construct  validity  of  the  resulting  GWI  dimensions.  Dimension  validity 
will  require  evidence  of  factor  stability  and  relevance  to  human  attribute 
and  job  content  criteria.  For  example,  relationships  between  the  GWI  dimen¬ 
sions  and  worker  aptitudes  will  be  determined  by  regressing  the  GWI  factor 
scores  and  aptitude-requi rement  estimates  for  a  sample  of  jobs  against  their 
mean  incumbent  aptitude  test  scores.  (Established  aptitude-requirement 
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indices  for  jobs,  such  as  relative  learning  load  or  cutoff  scores,  might 
also  serve  as  criteria  in  the  regression  analyses.)  In  addition,  it  will 
be  determined  if  existing  Air  Force  job  categories  (i.e.,  content  groups 
and  groups  based  on  the  principal  aptitude  requirement)  are  significantly 
discriminable  in  terms  of  their  mean  GWI  factor  scores  and  aptitude-require¬ 
ment  estimates. 

e.  Apply  the  GWI  dimensions  to  the  development  of  a  job  taxonomy  by 
grouping  the  sanple  jobs  on  their  factor-score  profiles.  The  obtained 
cluster  structure  will  be  checked  for  stability  and  meaningfulness.  The 
Air  Foret  classification  scheme  and  overall  job-similarity  groupings  will 
be  used  a;  criteria  for  ccnparison  to  the  GWI -based  job  taxonomy.  Addi¬ 
tionally,  if  the  GWI  dimensions  are  relevant  to  aptitude  requirenents,  the 
mean  aptitude  test  scores  of  job  incumbents  should  differ  significantly 
across  clusters. 

The  GWI  is  designed  as  a  short  version  of  the  QAI  which  can  be  adminis¬ 
tered  to  job  incumbents  and  experts.  "Man  is  viewed  as  an  information 
processing  syston  which  transforms  infonration  input  into  prescribed  out¬ 
comes"  (Cunningham  et  al. ,  1974,  p.  8).  This  framework  is  suited  to  the 
process  view  of  work  (i.e.,  input-throughout-output)  described  earlier  and 
relates  to  existing  inventories  as  presented  in  Figure  2.  The  (ML  elements 
are  generated  primarily  from  the  QAI  work  dimensions,  which  represent  types 
of  tasks  and  conditions  catitonly  occurring  in  the  world  of  work.  These 
elements  should  be  suitable  for  job  description  and  relevant  to  measured 
human  attributes.  The  dimensions  derived  from  GWI  elements  will  cover 
broader  aspects  of  related  work  which  can  be  used  to  profile  jobs  for  de¬ 
scription,  comparison,  and  classification  purposes. 

The  target  population  in  this  study  consists  of  the  jobs  performed  by 
skilled  (i.e.,  5-level)  enlisted  personnel  in  the  USAF.  The  Air  Force 
Military  Personnel  Classification  System  groups  positions  in  which  related 
vork  is  performed  into  specialties.  (Jobs  and  specialties  will  be  used 
interchangeably  in  this  discussion.)  The  underlying  principle  of  specialty 
formation  is  that  the  positions  included  have  similar  work  requirements  and 
therefore  require  similar  abilities.  The  specialty  will  be  the  primary  unit 
of  analysis  in  this  study.  A  sample  of  at  least  200  5-level  jobs  (specialties) 
will  be  selected  to  represent  the  existing  classification  scheme.  This  sample 
should  provide  a  reasonable  representation  of  the  ability  and  other  work 
requirements  in  the  target  population. 

Participants  in  this  study  will  perform  one  of  the  following  tasks: 
rate  a  position  or  job  with  the  GWI,  judge  the  pairwise  similarity  within 
a  job  subsample  or  rate  the  relevance  of  Armed  Services  Vocational  Ap¬ 
titude  Battery  (ASVAB)  corposite  areas  to  GWI  elements.  Four  different  groups 
of  personnel  will  rate  enlisted  5-level  specialties:  Job  incumbents  (INC) , 
their  supervisors  (SUP) ,  occupational  analysis  specialists  (QAS) ,  and 
promotion  test  development  personnel  (PTD) .  Ratings  by  non-INC  groups  will 
be  obtained  to  determine  the  reliabilities  of  different  rater  sources  and 
agreement  among  these  sources.  Senior  premotion  test  reviewers  and  occupa¬ 
tional  analysis  personnel  from  the  Occupational  Measurement  Center  and  career 
field  functional  representatives  fran  the  Manpower  and  Personnel  Center  will 
each  judge  the  pairwise  similarities  of  approximately  25  jobs.  (All  non-INC/ 
SUP  job  raters  and  similarity  judges  will  revrew  formal  job  descriptions  and 
relevant  occupational  analysis  information  when  completing  their  respective 
rating  task.)  The  pairwise  job  similarity  ratings  will  be  used  to  create  a 
job  similarity  matrix  as  a  basis  for  grouping  jobs.  Several  ASVAB  test 
development  psychologists  from  the  Hunan  Resources  Laboratory  and  personnel 
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Figure  2.  Work  analysis  frameworks. 
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psychology  graduate  students  and  staff  from  North  Carolina  State  University 
will  rate  the  relevance  of  major  ASVAB  areas  to  GWI  elements.  The  mean 
relevance  ratings  for  each  element  will  be  used  as  weights  to  derive  aptitude- 
requirement  estimates  for  jobs,  according  to  McCormick's  job  corrponent 
validity  procedure  (McCormick,  Cunningham,  &  Thornton,  1967) . 

The  process  of  collecting  GWI  job  ratings  is  relatively  straightforward . 
Questionnaires  will  be  mailed  to  INC  and  their  SUP,  and  ratings  by  OftS  and 
PTO  will  be  controlled  by  a  liaison  at  the  Occupational  Measurement  Center. 
Several  INC  in  each  job  will  rate  their  positions  with  or  without  the  as¬ 
sistance  of  their  SUP.  A  subset  of  40-50  jobs,  in  career  ladders  undergoing 
Staff  Sergeant  promotion  test  revision  during  the  administration  period, 
will  be  rated  by  all  groups.  For  these  catmonly  rated  jobs,  INC  and  their 
SUP  will  independently  rate  the  INC 1 s  position;  whereas  in  all  the  other 
150-160  jobs  INC  and  SUP  may  work  together  to  rate  the  INC's  position.  In 
order  to  collect  three  ratings  for  each  job  from  non-INC/SUP  groups,  we 
will  ask  each  CAS  to  rate  about  four  of  the  canton  jobs  and  each  FID  to  rate 
the  5-level  job  for  the  career  ladder  in  which  they  are  developing  the 
promotion  test. 

The  following  analyses  will  be  performed: 

a.  Interrater  item  reliability  analyses.  These  analyses  will  be 
carried  out  within  each  of  the  four  rater  groups  (INC,  SUP,  QAS,  and  PTD) , 
and  ratings  will  be  compared  for  agreement  across  the  four  groups. 

b.  Computation  of  GWI  aptitude-requirement  estimates  for  jobs,  fol¬ 
lowing  McCormick's  job  canponent  procedure.  These  estimates  will  be  derived 
by  combining  job  ratings  on  the  GWI  elements  with  those  elements’  aptitude- 
requirement  weights. 

c.  Factor  analyses  of  the  GWI  elements.  These  win  include  factor 
analyses  of  sections  (or  groups)  of  Qrtl  elements,  as  well  as  an  overall 
analysis  of  the  entire  set  of  elements.  A  factorial  stability  analysis  will 
involve:  (1)  dividing  the  job  sample  into  two  equivalent  subsamples,  (2) 
performing  independent  factor  analyses  with  both  subsanples,  and  (3)  compar¬ 
ing  the  results  for  replication  across  the  subsanples. 

d.  Cluster  analyses  of  jobs.  The  jobs  in  the  sample  will  be  cluster 
analyzed  on  two  bases:  (1)  similarities  among  their  GWI  factor-score 
profiles  and  (2)  analysts'  pairwise  job  similarity  jud^nents.  The  clusters 
derived  from  these  two  data  bases  will  be  ccnpared  for  agreement  with  each 
other  and  with  the  Air  Force  classification  scheme. 

e.  Several  construct  validation  analyses  will  be  carried  out,  including: 
(1)  regression  analyses  of  jobs'  Oil  factor  scores  and  ability-requirement 
estimates  against  the  ability  test  scores  of  job  incumbents  (and  possibly 
against  the  established  test  cutoff  and  relative  learning  load  scores  for 

the  jobs) ;  (2)  analyses  to  determine  the  discriminability  of  GWI -derived  job 
clusters  in  terms  of  the  ability  test  scores  of  incvsribents  in  the  jobs 
comprising  the  clusters;  (3)  analyses  to  determine  the  discriminability  of 
existing  Air  Force  job  categories  (both  content  and  aptitude  groupings)  in 
terms  of  the  GWI  factor  scores  and  ability-requirement  estimates  of  the  jobs 
comprising  those  categories;  and  (4)  the  cluster  comparisons  mentioned  in 
paragraph  d  above. 


WORK  CLASSIFICATION  SYSTEM  UTILITY 

Successful  performance  of  the  GWI  in  this  study  would  support  continued 
research  of  this  general  approach  in  Air  Force  work  analysis.  An  effective 
occupational  information  system  would  cover  a  range  of  descriptive  specificity. 
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fron  task  statements  (applicable  to  restricted  groups  of  positions) ,  to  more 
general  work  elements  and  dimensions  (applicable  to  the  broad  occupational 
spectrum) .  The  utility  of  the  system  would  also  depend  on  its  ability  to 
interrelate  the  different  types  of  descriptive  variables  (e.g.,  tasks,  work 
elements/dimensions ,  and  human  attributes) .  The  task  inventory  would  remain 
the  major  methodological  ocnponent  in  such  a  system,  supplemented  by  one 
or  more  instruments  containing  descriptors  of  a  more  general  nature. 

A  quantitative  work  taxonony  would  have  numerous  applications  for 
describing,  relating,  and  researching  the  characteristics  of  job®  and  workers. 
The  identification  of  work  variables  applicable  to  the  entire  occupational 
domain  could  provide  a  basis  for  describing  and  classifying  a  wide  variety 
of  jobs  within  the  same  system.  The  linkage  of  these  work  descriptors  to 
measurable  huien  attributes  would  permit  determination  of  relationships 
between  people  and  jobs.  In  a  sense,  this  relational  mapping  could  serve 
as  a  table  of  contents  for  understanding  ocmonalities  in  the  work  activities 
and  human  requirements  of  jobs  and  job  families.  Such  a  system  could  facili¬ 
tate  a  variety  of  human  resource  development  efforts.  For  example,  applica¬ 
tions  might  be  found  in  such  areas  as  occupational  exploration  and  guidance, 
recruitment,  job  transfer,  career  development,  training,  test  development, 
and  selection/ assignment. 
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The  Usefulness  of  Selection  Tests 


In  many  selection  situations,  soae  predictor  of  success,  such  as 
a  test,  is  used  to  discriminate  between  the  applicants  to  be  accepted 
and  those  to  be  rejected.  Using  an  interpretation  of  Brogden  (1946), 
this  paper  shows  the  effect  of  using  a  correlated  variable  on  the  mean 
criterion  score  of  the  selected  group.  The  distribution  of  the  Beans 
of  all  possible  selections  is  discussed  and  a  simple  formula  is  pro¬ 
vided  for  calculating  the  probability  of  achieving  as  high  or  higher 
mean  using  random  selection  as  opposed  to  using  a  correlated  variable. 
It  is  shown  that  even  fairly  small  correlations  result  in  selection 
decisions  that  are  not  often  exceeded  by  chance.  The  Brogden  interpre¬ 
tation  and  significance  test  is  shown  to  extend  simply  to  the  case  of 
several  predictors. 
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THE  USEFULNESS  OF  SELECTION  TESTS 


Albert  E.  Beaton  and  John  L.  Barone 
Educational  Testing  Service 

Introduction 

In  many  admissions  decisions  a  selection  test  is  used  to  decide  which 
members  of  an  applicant  pool  will  be  selected  for  the  available  openings- 
The  correlation  coefficient  between  the  selection  test  and  the  college's 
criterion  of  success,  Grade  Point  Average,  say,  are  often  In  the  range  of 
.40  to  .55  for  the  applicants  who  are  selected  and  then  enroll.  Such 
correlations  are  often  squared  and  then  interpreted  as  the  proportion  of 
variance  explained  by  the  predictor.  Such  a  statement  is  true  but  is  very 
difficult  for  an  admissions  officer  to  relate  to  what  the  effect  is  on  his 
entering  class.  Some  better  interpretive  scheme  is  needed. 

The  purpose  of  this  paper  is  to  show  the  effect  of  using  a  selection 
test  on  the  average  criterion  score  of  the  entering  class.  The  correlation 
coefficient  (times  100)  is  shown  to  be  the  percentage  of  improvement  of 
using  the  selection  test  over  what  would  happen  on  the  average  if  the  test 
wer >  not  used.  Further,  a  simple  formula  is  developed  for  approximating 
the  probability  that  selection  by  chance  would  yield  an  entering  class  as 
able  or  abler  than  a  class  selected  with  the  help  of  a  valid  predictor. 

It  is  further  shown  that,  for  reasonably  sized  colleges  and  with  reasonable 
assumptions,  selection  using  even  small  correlations  will  almost  certainly 
result  in  an  entering  class  that  would  earn  higher  criterion  scores  than 
would  occur  without  the  selection  test. 
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1.  The  Effect  of  Different  Selection  Procedures 

Let  us  assume  that  an  admissions  officer  has  the  task  of  filling  n 
available  openings  in  a  college  from  a  pool  of  N  applicants.  We  will 
call  the  selection  ratio  £  «  n/N  .  We  will  assume  throughout  that  his  aim 
is  to  select  the  "best"  applicants  in  the  sense  that  they  are  the  students 
who  will  perform  best  on  some  criterion  of  college  success  such  as  grade 
point  average  (GPA) .  The  score  on  the  criterion  variable  is  not,  of  course, 
known  at  the  time  of  admission  decision  and  is  never  known  for  applicants 
who  are  not  accepted  and/or  do  not  enroll. 

Let  us  now  number  the  applicants  with  an  index  i  (i— 1 ,2 , . . . ,N)  and 
name  the  criterion  variable  y  .  The  applicant  pool  is  the  population  of 
interest;  we  will  not  assume  that  it  is  a  sample  from  some  larger  popula¬ 
tion.  The  score  of  applicant  i  on  the  criterion  variable  y  is  y^ 
which  is  the  score  that  the  applicant  would  receive  on  the  success  criterion 
if  the  applicaat  were  selected  and  did  enroll.  For  convenience  and  without 
loss  of  generality,  we  will  consider  the  values  to  be  in  standard  form, 

that  is,  to  have  an  average  value  y  of  zero  and  a  standard  deviation  5 
of  one.  We  do  not  assume,  at  this  point,  that  the  distribution  of  the  y^ 
is  normal. 

Let  us  first  consider  what  would  happen  in  some  extreme  selection 
situations.  First,  let  us  assume  that  the  admissions  officer  is  prescient, 
that  is,  that  he  knows  exactly  hov  each  applicant  would  perform  on  y  if 
admitted  and  enrolled.  In  this  case,  the  admissions  process  would  be 
straightforward;  the  r.  candidates  who  would  receive  the  highest  y^  would 


be  accepted  and  in  this  way  the  average  score  on  the  criterion  would  be 
maximized.  The  decision  rule  is  clear,  except  for  tied  y4  at  the  cut-off 
point  which  could  be  selected  randomly.  We  will  call  the  group  selected 
this  way  the  optimum  group  and  the  mean  criterion  score  of  that  group  will 
be  called  .  Another  extreme  admissions  situation  is  the  case  in  which 
the  admissions  officer  knows  nothing  about  the  applicant  pool,  at  least 
nothing  that  Is  related  to  the  success  criterion.  We  can  consider  this 
case  as  knowing  the  applicants  index  number,  i  ,  and  no  more.  In  this 
case,  the  admissions  officer  might  select  applicants  at  random  and  hope 
for  the  best.  There  is  a  tiny  probability  that  the  resultant  admittees 
would  be  the  optimum  group,  but  there  is  also  a  tiny  probability  of  select¬ 
ing  the  group  with  the  lowest  possible  average  score  on  y  .  Yet,  with 
no  useful  predictive  information  available,  there  would  be  little  else 
that  could  be  done. 


However,  although  we  do  not  know  what  will  happen  if  the  applicants 
are  selected  at  random,  we  do  know  everything  that  could  happen.  First,  we 
know  that  there  are  precisely 


N ' 

C  "  n:  (N  -  n)  .' 


(1) 


different  ways  in  which  n  students  can  be  selected  from  a  pool  of  N 
applicants.  Let  us  number  them  with  the  index  c  (c-1 ,2 , . . . ,C) .  Each 
possible  selection  would  result  in  a  mean  score  for  the  criterion  which  we 
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will  label  .  We  also  know  that  the  average  of  all  possible  is 


ave(yc)  *  y  ■  0 


(2) 


that  is,  not  surprisingly,  that  the  average  of  all  samples  of  size  n  Is 
the  average  of  the  applicant  pool  which  is  zero.  We  also  know  that  the 
variance  of  the  C  possible  yc  is 


var(y£) 


i  -  f 

-  (3) 


since  the  variance  of  the  y^  is  unity.  Proofs  of  the  average  and  variance 
are  shown  in  Cochran  [1977].  If  the  student  selection  were  at  random,  then 
any  of  the  C  groups  is  as  likely  to  occur  as  any  other,  and  thus  the 
average  mean  criterion  score  over  a  large  number  of  random  assignments  would 
approach  zero.  We  will  label  the  average  mean  as  yQ 

Let  us  now  consider  a  less  extreme- -and  more  realistic — situation  in 
which  the  admissions  officer  has  some  imperfect  information  about  the  appli¬ 
cants.  Let  us  assume,  for  example,  that  each  applicant's  dossier  contains 
his  score  on  an  admissions  test.  We  will  call  chat  test  x  and  the  score  of 
applicant  i  on  that  test  x^  .  is  known  for  all  applicants.  For 

convenience,  we  will  also  assume  that  the  scores  on  x  are  in  standard  form 
with  zero  mean  and  unit  variance.  Let  us  also  assume  that,  from  experience, 
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we  know  that  the  linear  correlation  of  x  with  y  Is  p  .  We  will  assume 
throughout  that  p  >_  0  .  One  possible  admissions  rule  Is  to  select  the  n 
students  with  the  highest  scores  on  x  .  The  question  we  wish  to  explore 
here  is  how  much  better  will  the  admlttees  be  If  selected  using  x  rather 
than  selecting  at  random. 

We  do  know  something  about  what  vlll  happen  If  x  is  used  for  selection. 
First,  we  can  partition  the  criterion  scores  into  two  parts 


P*1  + 


(4) 


where  the  first  part  px^  Is  the  "predicted"  value  of  y^  and  Is  a 

residual.  Let  us  call  the  average  y^  for  the  selected  group  yp  .  If  we 
select  the  n  applicants  with  the  highest  scores  on  x  ,  then  the  value  of 


yP "  ^ r-  *i +  i  E+  ei 


(5) 


where  E+  means  sunning  the  scores  of  the  applicants  who  have  the  top  n 

values  of  x  .  If  the  relationship  between  y  and  x  is  linear,  that  is, 

the  conditional  mean  of  y  given  x  Is  px  ,  then  the  term  -Z.e,  will 

n  +  l 

vanish  and  thus  we  may  say 

yp  “  Px+  (6) 

where  x+  Is  the  average  of  the  top  n  applicants. 


Comparison  of  . the  average  criterion  score  of  selected  groups  under 

optimum  Information,  y^  ,  no  information,  ,  and  under  information 

from  some  correlate,  y  ,  leads  to  an  interesting  Interpretation  of  the 

P 

correlation  coefficient  which  was  first  proposed  by  Brogder  [1946],  Under 
the  assumption  that  the  distribution  of  y  is  the  same  as  the  distribution 
of  x  ,  then  the  average  of  the  top  n  scores  on  y  is  the  same  as  the 
Cop  n  scores  on  x  ,  although  the  actual  persons  having  those  scores  nay 
be  different.  Under  this  assumption,  therefore,  x+  -  y^  ,  and 


that  is,  the  correlation  coefficient  (not  squared)  is  the  ratio  of  the  mean 
of  the  group  selected  using  x  to  the  mean  of  the  optimum  group.  Since  the 
average  mean  of  random  selections  Is  eero,  lOOp  may  be  Interpreted  as  the 
percent  of  possible  improvement  in  selection  attained  by  using  the  selection 
test  over  what  would  happen,  on  the  average,  using  random  or  any  arbitrary 
selection  procedure  that  is  Independent  of  y  . 

rigure  1  shows  the  relationships  among  ,  y  »  end  y^  in  the 

case  where  the  top  102  of  the  applicants  are  to  be  selected.  The  abscissa 
represents  scores  on  y  and  the  ordinate  is  proportional  to  the  number  of 
applicants  with  a  particular  score.  The  normal  distribution  was  selected 
for  pictorial  purposes.  The  shaded  portion  of  the  curve  represents  the 
top  102  of  the  applicants  which  has  a  cut-off  score  at  +1.28.  The  average 
score  of  the  top  102  is  a  +1.755  which  is  the  mean  of  the  optimum  group, 
y^  .  The  average  random  selection,  y^  ,  is  zero.  The  line  on  the  graph 
between  y^  and  has  tick  marks  to  show  where  the  mean  of  the  selected 


group  would  be  If  the  correlation  with  the  selection  test,  p  ,  were 
+. 10, . 20, . . . , . 90.  If  p  ■  0  ,  then  the  effect  of  the  selection  test  would 

be  the  same  as  the  y^  and  if  p  *  1  ,  the  effect  is  the  same  as 


Insert  Figure  1  about  here 


2.  The  Probability  of  Exceeding  by  Chance 

The  improvement  of  the  mean  of  the  entering  class  using  a  small  correla¬ 
tion,  p  ■  . 10  or  .20,  say,  may  not,  at  first,  seem  worth  the  cost  of  the 
testing  effort,  but  this  is  seldom  so.  The  alternative  would  be  to  leave 
the  selection  to  chance,  since  there  is  at  least  some  probability  that  the 
chance  selection  would  be  better  than  the  selection  by  a  test.  The  question 

to  be  asked  now  is:  what  is  the  probability  of  actually  selecting  a  sample 

by  chance  which  results  in  an  average  y  as  high  or  higher  than  the  mean 

of  those  selected  by  a  test?  The  answer  to  this  question  is  conceptually 

simple  since  all  we  would  have  to  do  is  enumerate  all  C  possible  samples, 

count  the  number  of  samples  which  have  means  as  high  or  higher  than  yp  , 

and  divide  that  number  by  C  to  find  the  proportion  as  high  or  higher  than 

y^  .  However,  this  direct  solution  is  not  feasible  since  even  for  a  small 

problem  like  selecting  10  or  90  out  of  a  sample  of  100  has  a  value  of  C 

13 

is  approximately  1.73  x  10  .  However,  a  reasonable  approximation  to  that 

proportion  is  possible  and  used  here. 

The  ability  to  compare  random  selection  against  selection  by  x  gives 
an  opportunity  to  show  the  importance  of  even  small  correlations  in  selecting 
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FIGURE  1 

DISTRIBUTION  OF  APPLICANT  POOL 

,  IS  THE  AVERAGE  OF  ALL  RANDOM  SELECTIONS 
s  IS  THE  AVERAGE  OF  GROUP  SELECTED  WITH  p  =  .5 
,  IS  THE  AVERAGE  OF  THE  OPTIMUM  GROUP 
f  =  n/N*  .1 

CROSSHATCH  AREA  REPRESENTS  TOP  10%  OF  DISTRIBUTION 


an  entering  class.  Figure  2  shows  the  probability  of  doing  as  well  or 
better  by  chance  than  by  a  selection  test  using  the  assumption  that  the 
distribution  of  y  in  the  unselected  population  was  normally  distributed 
with  *ero  mean  and  unit  variance. 

Probabilities  are  graphed  for  applicant  pools  of  100  and  1000,  for 
different  selection  ratios,  and  for  different  values  of  p  .  We  see  that, 
in  selecting  50  out  of  100  applicants,  a  correlation  of  .30  is  sufficiently 
high  that  chance  selection  would  yield  a  higher  mean  than  selection  using 
the  test  not  more  than  one  time  in  a  hundred.  If  the  correlation  is  .40, 
the  selected  group  may  range  from  10  to  90  and  the  probability  that  a  group 
selected  by  chance  will  not  excel  the  predicted  group  will  not  exceed  one 
in  a  hundred.  With  an  applicant  pool  of  1000,  a  correlation  of  .10  is 
large  enough  so  that  the  probability  of  selecting  a  group  of  500  with  a 
larger  mean  by  chance  selection  is  less  than  .01  and  a  correlation  of  .30 
is  large  enough  so  that  the  probability  of  selecting  a  group  with  a  larger 
mean  by  chance  is  about  1/1000  or  less.  Thus,  the  use  of  a  test  with  even 
a  modest  correlation  will  almost  certainly  result  in  a  higher  average  of  y  . 


Insert  Figure  2  about  here 


Figure  3  shows  why  even  suen  small  correlations  are  useful  in  selec¬ 
tion.  Superimposed  on  Figure  1  is  the  approximate  distribution  of  y^ 
for  the  case  where  the  applicant  pool  is  100  and  but  10  are  to  be  selected. 
The  mean  of  this  distribution  is  zero  and  the  standard  deviation  is 
J.  03  »  .3  .  The  distribution  of  y^  is  very  tall  near  its  mean  when  compared 
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to  the  distribution  of  y±  which  indicates  that  most  of  the  y  are 
close  to  the  average  and  there  are  few  large  deviant  values.  It  is  true 
that  the  maximum  value  of  y£  ls  ^  ,  but  the  probability  of  this 
occurring  by  chance  is  approximately  .577  x  lo'13.  It  is  to  this  distri 
bet  ion  of  yc  that  the  value  yp  should  be  compared  and  it  is  clear 
that  even  for  this  very  small  problem  the  admissions  officer  is  unlikely 
to  improve  on  y p  by  chance.  For  more  realistic  situations  where  the 
applicant  pool  is  much  larger,  the  distribution  of  yc  is  even  more 
•piked  and  thus  the  probability  of  exceeding  y p  by  chance  is  even  less 
likely  for  many  reasonable  values  of  p 


Insert  Figure  3  about  here 
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CODAP  -  AN  APPLICATION  TO  TRAINING 
INFORMATION  SYSTEM  ANALYSIS 
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Training  Design  Officer  (CODAP) 

Royal  Naval  School  of  Educational  and  Training  Technology 
HKS  NELSON,  Portsmouth  P01  jHH,  England 


Summary 

In  October  19^0  a  Fleet  Management  Services  Team  began  a  study  of  the 
training  management  systems  at  three  of  the  Royal  Navy’s  apprentice 
training  establishments.  The  aim  was  to  formulate  system  parameters 
for  a  computer  in  each  establishment  to  replace  and  enhance  the  method 
of  gathering,  processing  and  retrieving  information  on  trainee  perfor¬ 
mance  and  training  management/assessment .  For  budgetary  reasons  the 
project  had  to  be  completed  in  13  weeks.  It  was  considered  impossible 
to  achieve  this  using  conventional  interview/analysis  processes,  and  a 
system  for  gathering  data  by  questionnaire  for  CODAP  analysis  was 
devised.  The  clustering  program  OVLGRP  proved  to  be  a  very  powerful 
tool  for  information  system  analysis  and  the  method  adopted  was  successful 
in  defining  the  computer  record  and  system  requirements.  The  project 
demonstrated  that  valid  results  could  be  obtained  from  a  complex  quest¬ 
ionnaire  and  indicated  that  the  technique  may  be  useful  in  management/ 
organisational  studies  generally.  ^ 


List  of  Annexes 

A.  Questionnaire  Extract  -  Information  Usage  Section. 

B.  CODAP  Processing  Plan. 

C.  Training  Management  Computer  Record  -  CODAP  Extract 

D.  Information  Usage  Profile  -  RMS  FISGARD  Group  6. 

E.  Information/Job  Time  Dossier  -  HMS  FISGARD  Group  6. 
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INTRODUCTION 


1.  The  Royal  Navy  has  used  CODAP  for  job  analysis  as  an  input  to  training 
design,  Branch  structuring,  opinion  and  attitude  surveys  since  1°?3. 

Recently  it  has  been  "marketed”  as  a  useful  and  powerful  tool  for  the  analysis 
of  any  manpower-based  data  and  for  any  purpose.  The  aim  of  this  short  present¬ 
ation  is  to  describe  the  use  of  CODAP  in  the  analysis  of  an  information  system, 
as  an  aid  to  a  management  services  survey. 

DISCUSSION 

2.  In  October  1980,  a  Fleet  Management  Services  Team  was  tasked  with  producing 
the  system  requirements  for  training  management  computers  in  3  of  the  Navy's 
apprentice  training  establishments.  The  survey  was  concerned  with  the  inform¬ 
ation  generated,  stored  and  U6ed  in  the  management  of  training  from  individual 
details  (name,  date  of  birth,  religion  etc.)  through  individual  progress  reports, 
class  and  course  results  to  statistical  analysis  of  tests.  The  main  aims  of  the 
survey  were: 

a.  To  identify  the  data  to  be  held  on  the  computer  to  satisfy  the  needs 
of  training  management. 

b.  To  develop  a  model  for  the  proposed  ADP  system. 

c.  To  identify  duplication  of  information  input,  processing  and  output  to 
point  to  organisational  improvement  (and  possible  staff  savings). 

d.  To  identify  mis-matches  between  the  information  system  and  organis¬ 
ational  structure. 

There  was  one  other  aim: 

e.  To  test  the  questionnaire  and  CODAP  to  establish  whether  it  should  be 
used  for  future  similar  projects. 

%  For  financial  budgetary  reasons  the  system  requirements  had  tc  be  defined 
in  the  short  space  of  13  weeks  from  the  start  of  the  project.  At  discussions 
between  the  survey  team  and  my  organisation,  the  RN  School  of  Educational  and 
Training  Technology  (RNSETT)  it  was  agreed  that  the  only  feasible  method  of 
achieving  this  was  to  conduct  a  questionnaire  survey  supported  by  ADP  analysis 
using  the  CODAP  suite  of  programs. 

4.  The  use  of  CODAP  to  analyse  an  information  system  and  indeed  a  management 
organisation  was  to  us  an  innovative  extension  of  its  capability.  Its  program 
OVLGRP  (overlap  and  group)  which  clusters  individuals  together  on  the  basis  of 
similarity  of  task  performance  was  considered  to  be  potentially  a  very  powerful 
tool  for  the  job.  Questionnaire  design  and  data  analysis  were  directed  towards 
the  use  of  this  program. 

QUESTIONNAIRE  DESIOI 


9*  The  major  component  of  the  Vpart  questionnaire  was  a  section  which  listed 
an  inventory  of  data  items  potentially  comprising  the  training  information 
system.  These  were  compiled  from  an  examination  of  the  present  systems  and  from 


experience  within  K<SETT  where  research  and  application  into  Computer  Based 
Training  had  been  established  for  2  years.  105  elements  were  identified, 
classified  into  the  following  categories: 

a.  Trainee  personal  details 

b.  Class  details 

c.  Course  details 

d.  Module  details 

e.  Examinations  and  tests 

f.  Training  documentation 

g.  Future  training  requirement 

6.  Since  the  aims  of  the  project  were  concerned  p  incipally  with  the  intro¬ 
duction  of  a  computer,  the  various  elements  of  an  AxJ*  system  were  used  to 
define  activities  applied  to  the  information  items.  5  functions  were  defined: 

a.  GATHERING  -  Collecting  information  from  original  source. 

b.  COMMUNICATING-  Passing  on  information  by  any  means. 

c.  STORING  -  Recording,  filing,  keeping  information. 

d.  RETRIEVING  -  Obtaining  information  from  files  or  records. 

e.  PROCESSING  -  Working  on  information  to  change  its  structure  in 

any  way. 

?.  To  add  to  the  refi  nement  of  the  ADP  system  model  it  was  decided  to  qualify 
involvemeut  in  any  of  the  activities  by  a  coarse  measurement  of  the  URGENCY 
attached  to  its  performance  and  the  EFFORT  at  present  needed  to  achieve  it.  In 
addition  for  each  information  element  a  further  response  capability  was  added  to 
enable  respondents  to  indicate  that  the  item  was  not  readily  available  to  them 
at  present,  but  they  would  like  it  to  be  (NEED).  So  this  part  of  the  question¬ 
naire  developed  into  a  matrix  of  rows  of  information  elements  X  columns  of 
activities,  each  activity  split  into  URGENCY  and  EFFORT  responses,  with  NEED  in 
the  final  col  turn .  The  105  information  items  were  thus  expanded  into  1155 
response  options.  A  copy  of  part  of  this  section  of  the  questionnaire  is  at 
Annex  A. 

a.  It  was  of  course  appreciated  that  this  was  a  complex  format  and  to  some 
extent  experimental;  this  contributed  to  the  aim  of  evaluating  the  system  for 
future  use.  However,  the  task  of  the  respondent  was  eased  by  only  having  to 
r.care  1  =  yes,  2  =  no  in  each  applicable  box  and  to  make  no  response  at  all  if 
he  was  not  involved  in  the  iten./activity.  Seme  weight  was  also  given  to  the 
idea  that  the  reason  for  the  survey  -  the  introduction  of  a  management  computer 
would  provide  an  element  of  motivation  to  do  a  good  job  on  the  questionnaire l 

The  other  two  parts  of  the  questionnaire  were: 

Part  1  -  Respondent  background  information,  and 

Part  2  -  Apportionment  of  Job  Time 

This  latter  section  was  included  so  that  information  usage  could  be  compared 
with  job  profiles  and  comprised  51  training  tasks  against  which  the  conventional 
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CODAP  job  description  was  computed  from  scores  of  1-9  indicating  relative  time 
spent . 

SAMPLE 


10.  In  each  of  the  3  establishments  just  over  100  officers  or  senior  ratings 
(312  in  all)  were  identified  as  being  significantly  involved  in  training  manageme 
Half  of  them  were  selected  for  the  survey  and  completed  questionnaires  from  155 
respondents  were  processed,  distributed  proportionately  between  the  establishment 

CODAP  STRATEGY 

11.  The  keystone  of  the  CODAP  strategy  was  the  clustering  program  OVLGRP  and  the 
questionnaire  and  processing  plan  were  built  around  it. 

12.  Two  CODAP  files  were  created  from  the  original  raw  data  file,  the  background 
variables  being  common  to  both.  The  processing  plan  is  shown  diagrammatically  at 
Annex  B.  In  file  1,  responses  to  the  URGENCY  boxes  were  formatted  as  T  (task 
time)  factors;  EFFORT  and  NEED  responses  as  H's  (history  variables).  We  were 
aware  that  the  range  and  meaning  of  responses  to  the  URGENCY  factors  (1  =  yes, 

2  =  no,  0  =  not  performed)  could  not  be  handled  meaningfully  by  CODAP's  job 
description  (JOBDEC)  computation.  However,  we  had  in  effect  a  2-point  scale 
where  a  score  of  1  or  2  indicated  task  performance.  The  job  description 
calculation  of  members  performing  was  therefore  valid,  and  we  were  encouraged 
by  the  conclusions  of  PASS  and  ROBERTSON  (NPRDC  -  1980)  that  there  would  be  littl 
loss  of  accuracy  compared  with  percentage  time  results  from  a  5»  7  or  9  point 
scale.  A  measured  statement  of  the  relative  performance  of  training  information 
activities  could  therefore  be  produced  easily  from  the  JOBDEC  program,  by 
establishment  or  selected  group,  as  illustrated  at  Annex  C.  This  process 
established  the  parameters  of  the  computer  training  information  record. 

13.  Similarly,  for  the  clustering  program  OVLGRP  the  TIME-OVERLAP  formula  could 

not  be  used,  but  the  TASK-OVERLAP  formula,  which  computes  for  each  pair  of 
individual  job  descriptions  the  average  of  the  ratios  of  common  tasks  to  total 
tasks  individually  performed,  could  be  properly  applied.  The  program  was  applied 
to  the  whole  file  and  to  sub-sets  of  the  file  for  each  establishment.  In  each 
case  the  resultant  DIAGRAM  was  examined  and  nodal  clusters  were  selected; .  for 
each  of  these  groups  programs  were  run  to  reveal  group  membership  and  "informatioi 
usage  profiles."  Some  of  the  profiles  splayed  remarkable  bias  towards  one 
activity;  groups  could  be  identified  at  >eing  GATHERERS,  RETRIEVERS  or  STORERS. 
Annex  D  illustrates  the  point.  The  p  ile  of  group  6  of  the  HMS  FISGARD  diagram 

shows  that  these  6  people  have  groupe  ecause  they  are  RETRIEVERS  of  information 

cut  of  the  first  75  activities  in  descenJi  ur  order  of  percent  performing  65  are 
RETRIEVING. 

14.  Of  course  by  the  nature  of  the  clustering  formula  we  should  not  be  too 
surprised  at  this  result,  but  the  clarity  of  job-typing  was  nevertheless  most 
rewarding.  We  were  able  to  identify  the  rank,  organisational  position  and  name 
of  group  members  so  that  their  similarity  of  information  usage  could  be  related 
to  their  function.  This  particular  group  comprised  a  Commander,  3  Lt  Cdrs  and 

2  Lieutenants,  most  of  whom  could  be  classified  immediately  as  being  in  positions 
in  the  management  function  where  retrieval  of  information  was  a  logical  job  featui 

15.  To  come  back  to  the  strategy,  membership  of  DIAGRAM  groupings  was  used  to 
select  job  descriptions  from  FILE  2's  job  time  apportionment  section.  Using  Grouj 
6  from  FISGARD  sprain,  we  could  see  that  the  prime  duty  area  is  Administration  and 
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the  details  of  group  task  performance  were  readily  obtainable  from  the  JOBDEC 
printout . 

16.  Using  both  files  the  variable  summary  progain  (VARSUM)  was  used  to  produce 
qualifying  data  on  each  information  item  on  URGENCY,  EFFORT  and  NEED  so  that 
deeper  analysis  of  the  system  requirement  could  be  carried  out.  The  binary 
YES/NO  response  meant  that  examination  of  the  modal  scores  could  be  carried  out 
easily. 

17.  The  2- file  methodology  employed  enabled  us  to  produce  dossiers  for  each  of 
the  selected  groups.  The  dossiers  comprised: 

a.  Group  members  by  name. 

b.  Members’  organisational  positions. 

c.  A  job  profile  histogram  of  the  group  and  average  overlap  value  of 
information  usage. 

d.  Information  involvement  and  activity  factors. 

e.  Information  activity  -  summary  of  significant  features. 

The  dossier  (for  FISGAHD  Group  6  again)  is  reproduced  at  Annex  E. 

CONCLUSIONS 

18.  These  are  stated  with  reference  to  the  aims  of  the  project: 

a.  Identifying  data  for  the  computer  record.  The  questionnaire  and  CODAP 
produced  quickly  and  efficiently  a  listing  of  data  items;  priorities  can  be 
easily  established  and  decisions  taken  on  record  length  based  on  training 
data  usage. 

b.  Developing  the  ADP  System  model.  The  CODAP  clustering  program  and  DIAGRAM 
provide  an  effective  means  of  analysing  the  information  syetem.  There  ie  no 
other  method  available  to  achieve  comparable  results  within  the  timescale. 
Results  would  be  improved  if  the  whole  target  population  completed  the 
questionnaire. 

c.  Identifying  duplication  of  information  activity.  Duplication  can  easily 
be  identified  through  CODAP.  However,  it  may  not  be  significant  except 
perhaps  in  the  GATHERING  function. 

d.  Identifying  mis-matches  between  the  information  system  and  Organisational 
structure.  The  CODAP  clustering  technique  and  associated  programs  provide  a 
unique  method  of  identifying  differences.  Whether  these  are  mis-matches  can 
only  be  ascertained  by  further  and  deeper  examination. 

e.  Finally  -  testing  the  questionnaire  and  CODAP  in  this  role.  The  project 
was  completed  well  within  the  13  week  time  limit,  the  system  has  proved  to 

be  sound,  efficient  and  rapid  and  it  is  considered  unlikely  that  the  job  could 
have  been  done  in  any  other  way  within  the  time  scale. 

RECOMMENDATION 

19.  We  have  recommended  to  our  management  that  the  use  of  a  CODAP  based  question¬ 
naire  for  the  recording  and  analysis  of  information  systems  should  be  placed  high 
on  the  list  of  support  facilities  to  be  considered  for  any  management  survey  in 
this  field. 
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ANNEX  E 


INFORMATION/JOB  TIME  DOSSIER-HMS  FISOARD  GROUP  6 
riSGARP  GROUP  ANALYSTS 

GROUP  6  (On  Part  3  Training  Information  DIAGRAM) 


Members:  6 


Organisational  Position 


Commander 

YOUNG 

ADMIN 

LT  CDR 

EVANS 

ADMIN 

LT  CDR 

CHAPMAN 

ADMIN 

LT  CDR 

LEWIS 

ADMIN 

LT 

I  AGES 

EXECUTION 

Technical  Education:  Craft 
Maths  and  Liberal  Studies  Officer 
General  Training  Group  Officer 
Workshops  Group  Officer 
Maths  Instructor 


Percentage 

Time 

Involvement 


JOB  PROFILE 


DATA  ACTIVITY  FACTOR 

172/525 

PERCENTAGE 

OVERLAP:  51.9 

DATA  ELEMENT  INVOLVEMENT 

( *5/105) 

Potential 

ACTIVITY  NOTE 

Trainee  Personal  Details 

29 

32 

Out  of  the  first  75 

Class  Details 

8 

0 

activities  (in 

Course  Details 

24 

29 

descending  order  of 

Module  Details 

10 

10 

%  performing),  65 

Examinations  and  Tests 

6 

13 

ore:  "RETRIEVING" 

TEC 

- 

1 

Six  Part  Documentation 

5 

6 

Future  Training  Reauirement 

3 

5 
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Berkowitz,  Melissa,  US  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences,  Alexandria,  Virginia.  (Thurs.  P.M.) 


AREIS:  A  Computer-Based  Educational  Counseling  System  fo  :  the  Army 


Soldiers  need  Information  about  career  progression  and  educational 
options  to  maximize  their  professional  development  and  their  likelihood 
of  remaining  with  the  Army.  Minimal  information  is  provided  by  Army 
Continuing  Education  System  counselors  due  to  the  volume  of  information 
needed  to  make  career  decisions  and  the  limited  time  available  to 
counsel  each  soldier. 

^his  paper  will  discuss  one  solution  to  managing  the  vast  quanti¬ 
ties  of  career  development  information  through  the  use  of  an  individ¬ 
ualized  computer-based  career  counseling  system.  The  US  Army  Research 
Institute,  with  contract  support,  has  produced  a  prototype  computer- 
based  system  called  the  Army  Education  Information  System  (ASEIS). 
AREIS  will  assist  the  soldiers  in  defining  work-related  interests, 
skills  and  values  to  prepare  them  to  identify  their  educational  or 
vocational  goals.  The  AREIS  will  also  maintain  a  data  bank  for  Educa¬ 
tion  Center  personnel  so  that  career  data  can  be  compiled  for  plan¬ 
ning  and  reporting  purposes. 

f 
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ARSIS:  A  Computer-Based  Educational  Counseling  System  for  the  Army 


Melissa  S.  Berkowitz,  Ph.D. 

Basic  Skills  Instructional  Systems  Technical  Area 
US  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the 
author  and  should  not  be  construed  as  an  official  Department  of  the  Army  position, 
policy,  or  decision,  unless  so  designated  by  other  official  documentation. 
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AREIS:  A  Computer-Based  Educational  Counseling  System  for  the  Array^ 


INTRODUCTION 


BACKGROUND 

One  objective  of  the  US  Army  is  to  produce  a  combat-ready  force  through 
the  development  of  personal  skills  and  military  proficiency.  The  Army 
Continuing  Education  System  (ACES)  supports  this  objective  by  providing 
educational  opportunities  to  soldiers  and  enabling  them  to  develop  career 
goals  that  include  military  service  and  post-service  education  and  training. 

Army  Education  Centers  (AEC)  have  been  established  at  every  Army  post  having 
a  minimum  of  750  military  members.  These  Centers  provide  programs  which 
1)  satisfy  the  skill  development  and  occupational  needs  of  the  Army,  2) 
increase  soldier  potential,  3)  enhance  job  satisfaction,  and  4)  increase 
personal  educational  growth.  Specific  offerings  for  academic  education  in¬ 
clude  the  Basic  Skills  Education  Program  (BSEP) ,  Advanced  Skills  Education 
Program  (ASEP) ,  High  School  Completion  Program,  and  Service  members  Opportunity 
Colleges  Associate  Degree  (SOCAD)  Program.  In  skill  development  the  AEC  pro¬ 
vides  language,  military  occupational  speciality  (MOS) ,  and  occupation-oriented 
courses.  In  the  area  of  skill  recognition  the  Army  Apprenticeship  Program 
and  Defense  Activity  for  Non-traditional  Education  Support  (DANTES)  Certificate 
Training  are  offered.  The  AEC  also  provided  education  services  which  include 
counseling,  testing,  and  the  support  of  a  learning  center. 

The  Adjutant  General  (TAG)  supervises  ACES  and  develops  policy  and  guid¬ 
ance.  Each  installation/community  commander  conducts  the  ACES  program  through 

1.  This  project  was  performed  for  the  US  Army  Research  Institute  for  the  Be¬ 
havioral  and  Social  Sciences  by  the  DISCOVER  Foundation,  Inc.  under  contract 
MDA-903-C-0279. 
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the  Education  Services  Officer  (ESO)  so  that  educational  and  vocational  oppor¬ 
tunities  described  in  the  previous  paragraph  are  made  available  to  all  service 
members.  The  ESO  supervises  the  provision  of  these  programs  through  contract 
instructors  and  the  work  of  Education  Center  counselors.  The  counselors  are 
required  to  provide  each  service  member  with  program  information  and  counseling 
during  initial  training,  within  30  days  of  arrival  at  new  duty  stations,  an¬ 
nually  during  the  first  enlistment,  and  30  days  prior  to  separation.  The  coun¬ 
seling  emphasizes  military  professional  development,  educational  opportunities. 
Veterans  Assistance  Program  (VEAP)  policies,  and  postservice  educational  bene¬ 
fits. 

The  primary  means  of  delivering  information  about  educational  and  vocation¬ 
al  opportunities  rests  with  the  Education  Center  counselors.  Two  developments 
have  hampered  the  provision  of  services  by  the  Education  Center  staff:  the 
increasing  quantity  and  complexity  of  educational  and  vocational  options,  with 
a  resultant  explosion  of  resource  information;  and  the  reduction  in  the  number 
of  Education  Center  counselors.  Education  Centers,  increasingly  understaffed 
for  the  increased  workload,  are  experiencing  difficulty  in  adequately  serving 
their  constituency. 

Hence  it  has  become  evident  that  other  means  of  supplying  standardized, 
up-to-date,  easily  accessible  educational  and  vocational  information  are  needed. 
One  such  means  is  a  computer-based  information  system.  Over  the  past  two  de¬ 
cades,  a  growing  number  of  guidance  professionals  have  become  increasingly  com¬ 
mitted  to  the  use  of  the  computer  to  assist  with  the  access  and  delivery  of  in¬ 
dividualized  educational  and  vocational  Information  (Katz  &  Shatkin,  1980). 

The  unique  capabilities  of  the  computer  to  store,  search,  retrieve,  and  update 
large  masses  of  information;  to  relate  educational  and  vocational  data  to  in¬ 
formation  about  the  user;  to  simulate  an  interactive  dialogue;  and  to  serve  many 
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users  simutaneously  with  tailored  information  have  validated  the  worth  of  this 
technological  aid  to  the  counseling  process. 

The  computer-based  information  system  is  intended  to  function  in  concert 
with,  not  instead  of,  the  activities  performed  by  guidance  counselors.  As  the 
computer  carries  out  information  retrieving  and  dispensing  functions  and  cleri¬ 
cal  duties,  counselors  would  gain  time  to  perform  the  professional  duties  for 
which  they  were  trained  and  for  which  they  are  needed  —  one-to-one  interviewing, 
group  guidance,  and  consultation. 

This  paper  will  discuss  the  design  and  on-going  development  of  a  computer- 
based  educational  and  vocational  information  system  which  is  one  effort  to  over¬ 
come  the  increase  of  guidance  information  and  decrease  of  counseling  personnel 
in  the  military. 


AREIS  Army  Education  Information  System 

Seeds  Assessment.  The  US  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences  (ARI)  initiated  a  research  effort  to  conceptualize  and  develop  a  pro¬ 
totype  computer-based  system  which  would  provide  Information  on  military  and 
civilian  education  programs  related  to  the  Army  career  progression.  This  effort, 
performed  by  the  DISCOVER  Foundation,  Inc.,  under  contract  MDA  903-79-C-0279 , 
provided  a  design  for  the  Army  Education  Information  System  (AREIS)  based  on  the 
results  of  a  needs  assessment  survey  administered  to  Education  Services  Officers 
(ESOs)  and  Education  Center  counselors  at  posts  worldwide  (Harris-Bowlsbey  & 
Raybush,  in  press). 

The  needs  assessment  instruments  were  designed  to  collect  data  concerning 
1)  demographic  information  about  the  Education  Center,  2)  the  variety  and  fre¬ 
quency  of  information  requested  by  soldiers  at  the  Education  Center,  and  3)  ESO 
and  counselor  attitudes  about  using  computers.  The  instruments  were  distributed 
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to  all  major  commands.  The  return  rate  for  the  ESOs  was  72%  with  131  of  162 
questionnaires  mailed  back.  The  return  rate  for  counselors  was  64%  with  313 
of  494  counselors  responding. 

The  following  summarizes  the  demographic  data  supplied  by  the  ESOs  and 
counselors.  Of  the  144  posts  responding  the  permanent  population  ranged 
from  50  to  48,000.  The  number  of  counselors  per  post  ranged  from  2  to  12. 

Data  indicated  that  each  counselor  annually  serves  between  1,000  and  2,000 
soldiers.  The  average  workload  is  1,600  soldiers  per  counselor. 

Counselors  indicated  that  half  of  their  time  is  spent  on  one-to-one 
counseling  of  soldiers  with  the  remainder  distributed  over  administrative 
duties,  orientation/outreach  programs,  clerical  duties,  liaison  efforts,  re¬ 
search  and  development,  and  other  miscellaneous  tasks.  Counselors  provide  an 
average  of  two  interviews  per  soldier  per  year.  This  represents  approximately 
64%  of  their  workload.  Counselors  and  ESOs  ranked  tuition  assistance,  college 
course  offerings  on  or  near  post,  and  information  about  tests  (DANTES,  SAT, 

CLEP)  as  ACES  program  information  they  were  most  frequently  asked  about .  In 
declining  order  of  frequency  they  were  asked  about  orientation  to  the  Education 
Center  services,  associate  degree  programs,  college  credit  for  military  exper¬ 
ience,  and  BSEP.  In  ranking  information  requested  about  career  planning,  coun¬ 
selors  and  ESOs  indicated  the  following  in  descending  order  of  frequency:  de¬ 
veloping  a  personal  career  plan  in  and  beyond  the  military,  assessing  interests, 
and  making  the  transition  from  a  military  to  a  civilian  job. 

ESOs  and  counselors  also  responded  to  a  series  of  questions  to  determine 
their  attitudes  about  the  usefulness  of  computerization  of  ACES  information  now 
available  in  print  form.  They  indicated  that  computerization  of  information 
about  new  and  existing  ACES  programs.  Department  of  the  Army  regulations,  master 
schedule  of  courses,  and  JIGS  and  civil  Ian  Occupations  would  be  considerably  use¬ 
ful.  Counselors  and  ESOs  agreed  that  a  computerized  system  would  provide  soldiers 
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with  consistent  information  and  would  oost  likely  be  used  frequently  by  soldiers. 
They  also  agreed  that  this  type  of  system  would  be  welcomed  by  counselors  be¬ 
cause  it  would  enable  them  to  counsel  more  soldiers  by  reducing  their  adminis¬ 
trative  workload.  Counselors  also  indicated  a  need  for  training  on  the  use  of 
a  computerized  system.  In  general,  counselors  and  ESOs  were  positive  about  the 
usefulness  of  this  type  of  system  as  a  tool  to  support  Education  Center  opera¬ 
tions. 

ARE IS  Specif ications .  The  results  of  the  needs  assessment  provided  DISCOVER 
Foundation,  Inc.  with  the  data  necessary  to  formulate  a  conceptualization  of  a 
computer-based  guidance  system  designed  specifically  for  Education  Center  use. 

The  AREIS  is  composed  of  four  interactive  subsystems.  Subsystem  I  is  the 
ORIENTATION  which  is  the  entry  point  for  the  soldier.  The  objectives  of  this 
subsystem  are  to  I)  familiarize  the  user  with  the  computer  terminal  and  printer, 
2)  provide  instruction  about  the  content  of  the  AREIS,  3)  explain  the  Education 
Center  services  and  4)  provide  an  overview  of  all  ACES  programs.  The  second 
subsystem  is  SELF-INFORMATION  which  has  been  designed  to  help  soldiers  generate 
information  about  themselves  to  formulate  short  or  long  range  goals  for  their 
active  duty  and  beyond.  Subsystem  II  helps  soldiers  define  their  work-related 
interests,  aptitudes,  skills  and  values.  Subsystem  III  GOALS  and  PLANNINC  helps 
soldiers  identify  goals  related  to  career  and  education  and  provides  details  of 
ACES  programs  which  can  help  them  achieve  their  goals.  This  subsystem  provides 
information  on  the  following  goals: 

1.  to  improve  basic  skills 

2.  to  develop  new  interests  for  self-improvement  or  use  of  leisure  time 

3.  to  get  some  job  skills 

4.  to  complete  the  next  step  in  education 

5.  to  plan  a  military  career 


6.  to  improve  MQS  proficiency 

7.  to  select  a  secondary  MOS 

8.  to  get  promoted 

9.  to  make  a  good  decision  about  re-enlistment 

10.  to  make  a  vocational  choice 

11.  to  complete  an  educational  degree  alter  leaving  the  military 

12 .  to  make  the  Army  a  career 

Subsystem  IV  COUNSELOR-ADMINISTRATOR  has  been  designed  to  reduce  the  clerical 
workload  of  counselors  and  provide  them  with  information  to  be  used  during  coun¬ 
seling  interviews.  The  subsystem  contains  data  files  which  include  descriptions 
of  MOSs,  civilian  occupations,  and  educational  opportunities.  These  files  may  be 
accessed  directly  by  the  soldier  through  interactive  dialogue  or  by  the  counselor. 

A  second  part  of  this  subsystem,  accessible  only  by  Education  Center  personnel 
contains  the  Soldier  Educational  Development  Record  (DA  Form  669)  for  each  soldier, 
a  master  schedule  of  courses  offered  on  or  near  the  post,  and  all  course  rosters. 
AREIS  Field  Tryout.  A  field  tryout  of  portions  of  the  four  AREIS  subsystems  was 
conducted  at  the  Ft.  Sill,  Oklahoma  Education  Center  in  April  1980.  The  following 
segments,  which  represent  approximately  one-third  of  the  total  system  as  specified 
above,  were  tested: 

1.  Subsystem  I  ORIENTATION:  an  overview  of  AREIS,  Education  Center  services 
and  ACES  programs. 

2.  Subsystem  II  SELF-INFORMATION:  on-line  administration  of  the  UNIACT  In¬ 
ventory  (  c  1978,  American  College  Testing  Program).  This  is  a  sixtv 
item  interest  inventory  which  provides  the  respondent  with  a  family  of 
occupations  to  examine. 

Subsystem  TII  GOALS  and  PLANNING:  the  goal  entitled  "To  complete  the 
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next  step  in  Education,"  designed  to  provide  information  about  educa¬ 
tional  offerings  on  or  near  a  specific  Army  post. 

4.  Subsystem  IV  COUNSELOR-ADMINISTRATOR:  a  demonstration  of  administrative 
documents  which  may  be  maintained  by  computer  such  as  DA  Form  669, 
master  schedule  of  courses,  and  summary  report  data. 

The  preceeding  segments  of  the  AREIS  subsystems  were  programmed  in  PLANIL  (Pro¬ 
gramming  Language  for  Interactive  Teaching)  on  the  Army's  UNIVAC  1108  Computer 
at  the  Edgewood  Arsenal,  Maryland  and  delivered  to  Ft.  Sill  in  a  time  sharing 
mode. 

Twelve  counselors  and  sixty-four  soldiers  participated  in  the  field  tryout. 

The  soldiers  were  volunteers  whc  had  come  into  the  Education  Center  for  infor¬ 
mation.  On-line  surveys  were  given  to  the  soldiers  prior  to  using  the  AREIS  and 
after  each  subsystem  to  determine  their  attitudes  on  the  usefulness,  clarity, 
and  interest  level.  The  computer  and  the  AREIS  content  were  perceived  useful  by 
the  soldiers  for  educational  and  vocational  planning.  Counselors  indicated  that 
the  information  provided  by  the  AREIS  subsystems  was  useful  and  accurate.  They 
responded  favorably  to  the  delivery  of  educational  information  to  soldiers  by 
computer. 

Future  Direction  for  AREIS.  Recently  a  contract  was  awarded  to  the  DISCOVER 
Foundation,  Inc.  to  complete  the  AREIS  subsystem  development  and  conduct  a  field 
trial  of  the  system  at  three  Army  sites.  A  cost/benefit  analysis  of  alternate 
delivery  systems  for  the  AREIS,  conducted  under  the  previous  contract,  has  guided 
the  selection  of  a  micro-computer  for  the  AREIS  hardware.  The  hardware  systems 
compared  for  the  cost/benefit  analysis  were  the  maxi-computer,  distributed  network 
of  mini-computers,  and  stand-alone  micro-computer.  The  micro-computer  was  recom¬ 
mended  because  it  has  the  greatest  cost  feasibility,  requires  a  minimum  of  tech¬ 
nical  and  clerical  support,  is  easily  operated  by  non-technical  personnel,  and  can 
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be  readily  installed  overseas.  The  AREIS  hardware  will  consist  of  a  micro¬ 
computer,  color  monitor,  and  printer.  Some  limitations  of  the  PLANIT  authoring 
language  were  identified  during  the  field  tryout.  These  include  the  inability 
to  search  data  files,  clear  the  screen  completely,  and  remain  in  contact  with 
the  computer  after  a  five  minute  delay  between  users.  Another  authoring  language 
will  be  selected  for  the  software  development  which  does  not  have  the  above  limi¬ 
tations. 

AREIS  support  documentation  will  be  prepared  to  provide  a  U3er's  Manual  and 
a  cograin  for  in-service  training  of  ESO’ s , counselors ,  and  clerical  personnel. 
AREIS  subsystem  software  will  be  completed  according  to  the  conceptualization 
discussed  previously.  An  in-depth  field  trial  of  the  total  system  will  be  con¬ 
ducted  at  three  Education  Centers:  Ft.  Meade,  Maryland;  Ft.  Gordon,  Georgia; 
and  Heidelburg,  Germany.  The  field  trial  will  provide  data  on  the  use  patterns 
of  all  subsystems,  user  reactions,  influence  of  system  use  on  soldiers,  and 
impact  of  the  system  on  ESOs,  counselors,  and  clerical  staff.  This  effort  will 
be  completed  during  the  second  quarter  of  FY  83. 

CONCLUSION 

ARI  is  guiding  the  development  and  field  trial  of  the  AREIS  as  one  solution 
to  the  problem  of  the  surge  of  educational  and  vocational  information  being  pro¬ 
vided  by  a  decreasing  number  of  Education  Center  counselors.  A  preliminary 
field  tryout  has  indicated  that  the  application  of  computer  technology  to  Educa¬ 
tion  Center  operations  is  most  welcome  by  ESOs,  counselors,  and  soldiers.  The 
potential  payoff  for  the  AREIS  may  be  observed  in  increased  soldier  potential, 
job  satisfaction,  and  personal  educational  growth  while  supporting  the  occupa¬ 
tional  needs  of  the  Army  in  the  defense  of  the  nation. 
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Summary 

) For  the  purpose  of  adaptive  intelli9enee  measurement  (CAT) 
a  model  is  proposed,  which  uses  response-time  data  for 
estimating  person  parameters  and  item  parameters  on  one 
scale.  It  la  assiaeed  that  there  is  a  close  connection 
between  two  relations i  ability  and  difficulty  on  the 
one  hand,  response- time  and  critical  response-time  (CRT) 
on  the  other  hand.  The  CRT  of  an  item  la  defined  es  the 
meen  RT  of  eubjecte  whose  ability  equals  the  difficulty  of 
this  ltesi.  in  such  test  situations  the  probability  of  a 
correct  answer  equals  .5.  Relevant  subject-item- combinations 
(l.e.  critical  rasponsa  times)  can  be  identified  by  rank¬ 
ordering  the  lines  and  columns  of  a  solution-matrix  according 
to  sums  correct.  The  border  (diagonal)  between  the  correct 
and  incorrect  triangle  of  the  matrix  is  the  guide-line  of 
the • model. 


Explanation  of  the  model 

Adaptive  testing  makes  no  use  of  the  sum  of  correct  answers, 
because  most  of  the  subjects  have  the  same  sum  correct, 
derived  from  different  item  samples.  A  single  correct  (or 
incorrect)  answer  gives  us  only  a  very  rough  bit  of  information. 
In  fact  it  tells  us  nearly  nothing.  It  needr  very  complicated 
scaling  methods  (as  provided  for  instance  by  item-response- 
theory)  to  derive  estistates  of  per  son- parameters  from  binary 
data.  I  propose  to  make  use  of  an  additional  source  of  data: 
of  item  response- time. 

Dependent  variables  like  intelligence  scores  may  be  determined 
by  a  host  of  different  independent  variables.  Most  of  these 
we  don 1 1  want  to  be  mixed  up  with  variables  which  we  try 
to  measure.  Because  in  practice  we  can  not  identify  and 
substract  the  so-called  error  variance  from  the  empirical 
scores  and  get  true  scores,  we  might  as  well  forget  about 
the  error  variance  in  the  phase  of  test  construction  and  test 
application  and  take  account  of  It  in  the  phaae  of  test 
interpretation . 

For  the  measurement  itself  I  assume  that  an  ltsm-rasponse  x 
of  subject  s  to  item  i  is  merely  a  function  of  the  relation 
between  the  ability  A  of  subject  s  and  the  difficulty  D 
of  item  i. 

*.1  — *  f  {\  t  V 

From  a  correct  answer  (xgi  “1)1  conclude  that  ability  As  is 

greater  than  difficulty  .  And  from  an  incorrect  answer 

(x  .  »  0)  I  conclude  that  ability  A„  is  smaller  than  difficulty 


Kith  a  given  sub ject-ltea- combination,  the  probability  of  a 
correct  response  depends  on  the  relation  between  the  respective 
ability  and  difficulty. 


P.i  “*  f  <A.  '  V 

The  probability  psi  increases  with  Increasing  ability  and/or 
decreasing  difficulty. 

Subject  and  itaa  are  independent  components  of  the  test 
situation.  Therefore  ability  and  difficulty  uy  be  described 
as  ordinate  and  abscissa,  or  as  short  sides  of  a  rectangular 
triangle . 
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In  Figure  1  the  relation  between  Ag  and  is  the  cotangent 
of  angle 
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The  smaller  the  angle  the  greater  the  probability  of  a 
correct  answer. 

The  basic  assumption  of  my  model  says,  that  there  is  a 
correlation  between  item- response- time  and  angel 
In  other  words: 

I  The  response  time  t  A  needed  by  subject  s 

to  solve  item  i  is  a  function  of  the  relation 
between  the  ability  Ag  and  the  difficulty 
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The  most  simple  relation  between  item- response- time  and  probability 
of  a  correct  answer  would  be  a  linear  function.  But  I  don't  expect 
things  to  be  as  aiaqple  as  that  in  reality  (see  Figure  2). 


Fig.  2 


To  every  item  there  is  a  minimum  response-time  which  may  perhaps 
be  reached  at  a  probability  level  of  .8  or  higher.  If  on  the 
other  hand  the  probability  of  a  correct  answer  is  very  low,  i.e. 
if  the  item  is  much  too  difficult  for  the  subject,  response- time 


will  In  uny  cases  be  relatively  low,  because  the  subject 
realizes  that  he  has  no  chance  and  gives  up  or  guesses.  So  only 
in  the  middle  region,  where  the  probability  of  a  correct  answer 
is  about  .5,  I  expect  t#i  to  be  a  linear  function  of  Psl:  the 
higher  the  probability  the  shorter  the  response-time.  Suppose 
now  that  we  have  a  second  subject  c  but  the  same  item.  At 
least  with  medium  probability  of  a  correct  answer  (CAT-region) 

I  expect  the  following  equations  to  hold  true: 
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Let  subject  c  be  a  special  subject;  i.e.  e  subject  whose 
shility  equals  the  difficulty  of  the  item. 
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Xn  this  case  the  angle  0<ci  (Fig.  1)  will  be  45°. 
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Appointment; 

The  critical  reaponse-tlme  t  j  of  item  i,  is  the  mean 
response-time  of  those  subjects  whose  ability  equals  the 
difficulty  •  In  such  cases  the  probability  of  a  correct 
answer  to  item  i  is  equal  to  .5. 
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The  relation  between  the  Individual  response-time  of  eubject  a 
answering  item  i  and  the  critical  response-tine  of  itea  1 
determines  the  size  of  angle  0^.  The  cotangent  of  this  angle 
is  the  relation  between  the  ability  of  subject  s  and 
the  difficulty  of  item  1. 
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(Formula  1 ) 


A  >“  ability  (ordinate) 
s  «  subject 

D  **  difficulty  (abscissa) 
i  *  item 

t  ■  response-time 
<  »  critical  RT 


For'  practical  use  thle  equation  can  be  transformed  into  the 
following  versions: 


.  cotg  45 


( Formula  2) 


X>t  »  As  .  tg  45 


(Formula  3) 


Critical  response- time  (CRT) 

The  critical  RT  of  an  item  was  defined  above  aa  "the  mean  RT  of 
those  subjects  whose  ability  equals  the  difficulty  of  the  item". 
How  can  we  determine  CRT,  if  we  do  know  .neither  ability  nor 
difficulty? 


In  test  situations  where  ability  and  difficulty  are  equal,  the 
probability  of  a  correct  answer  is  equal  to  .5.  This  we  can 
estimate.  The  probability  that  subject  s  will  solve  item  i 


correctly,  I. as  two  aspects.  There  is 

-  an  active  probability  pg.  of  subject  •  to  solve  correctly,  and 

-  a  passive  probability  p  ^  of  Item  1  to  be  solved  correctly. 

In  a  solution  surtrix  these  two  aspects  correspond  to  the 
relative  frequencies  of  answers  correct  In  lines  and  columns 
of  the  matrix. 


The  following  check-list  outlines  the  procedure  for  estivating 

the  critical  response-tinea  for  the  items: 

-  Item-sample  and  subject-sample  should  at  best  be  evently 
distributed  In  respect  to  ability  and  difficulty 

-  give  all  items  one  by  one  in  random  order  to  all  subjects 
on  a  screen 

-  measure  the  time  from  item-presentation  to  key-pressing 

-  stake  out  a  solution-matrix  and  a  response- time -matrix 

-  rearrange  the  lines  and  columns  of  the  solution-matrix 
so  that  they  both  form  a  rank-order  according  to  sum 
correct  (see  simplified  example  next  page) 

-  rearrange  the  tfi-matrix  in  the  same  way  as  the  x^-matrlx 

-  make  out  the  p3i-natrlx  (p#l  *  (pg  *  p  j)  :  2) 

-  compare  pgi~matrix  with  tgi-matrix  and  find  in  every 
item-column  (if  necessary  by  interpolation)  the  t  ^-estimate 
which  corresponds  to  pgi  •  .5;  if  there  are  two  or  wore 
relevant  estimates  for  t  compute  the  mean. 
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Item  calibration 


Once  we  have  computed  the  critical  response- time-values  of  the 
items,  we  can  make  out  the  cC-matrix  (o(>1  ”  45  (t  ^  :  t  ^)). 
on  the  basis  of  which  ability  and  difficulty  can  be  estimated 
by  use  of  formulas  2  and  3.  (See  the  continuation  of  our  example, 
next  page) .  But  at  first  we  must  decide  where  the  parameter- 
scales  are  to  be  anchored. 

In  our  example  we  might  arbitrarily  state  that  subject  No.  1  has 
an  ability-score  of  Aj  »  100.  Mathematically  it  would  be  possible 
to  compute  nearly  all  difficulty-scores  by  use  of  response-tlbes 
or  alpha-values  in  the  matrix-line  of  subject  No.  1.  (Only  item 
3  can  not  be  calibrated,  because  we  have  no  t  ^-value) .  But  it 
is  not  advisable  to  do  it  this  way. 

For  the  purpose  of  item-calibration  we  should  make  use  only 
of  those  test-situations,  where  the  respective  item  was  given 
to  a  subject  whose  ability  la  about  as  high  as  the  difficulty 
of  the  item  and  who  has  solved  the  item  correctly,  as  mentioned 
above,  I  expect  response-time  data  to  be  relatively  invalid 
indicators  fcr  the  ability/etif ficulty-relations,  if  there  is 
a  great  difference  between  As  and  .  So  only  the  tsi-values 
lr.  the  diagonal  region  of  the  matrix  should  be  used.  That  means 
that  we  should  start  with  the  anchor  subject  (or  anchor-item) 
and  go  along  the  diagonal  into  both  directions,  estimating 
alternatively  A-values  and  D-values. 


Application 

The  model  was  worked  out  for  computerised  adaptive  testing  (CAT) . 

It  makes  it  possible  to  estimate  a  subjects  ability  on  single 
items.  A  calibrated  item  represents  a  point  on  the  difficulty 
scale.  The  ability  of  a  subject  is  assumed  to  be  greater  or 
smaller  than  this  value,  depending  on  whether  the  answer  is 
correct  or  incorrect.  This  information  can  be  specified  by  taking 
response-time  into  account  and  estimating  ability  with  formula  2. 

So  the  item  does  not  only  define  sn  ability  limit,  but  measures 
a  certain  ability-range  adjacent  to  its  difficulty-value.  Therefore 
item-pools  for  the  practice  of  CAT  can  be  smaller  than  they  are 

required  for  other  methods  used  today.  I  hope  that  this  statement 
will  be  confirmed  by  empirical  investigations. 


The  Maintenance  Performance  System 


Dr.  Douglas  J.  Bobko 
Research  Psychologist 
Army  Research  Institute 
Alexandria,  Virginia 


The  Maintenance  Performance  System  (MPS)  is  a  computer-based  manage¬ 
ment  system  designed  to  improve  the  conduct  and  quality  of  maintenance 
training  at  the  direct  support  level;  it  has  been  in  operation  at  a 
FORSCOM  maintenance  battalion  for  neai?y  one  year.  An  evaluation  of  the 
system  suggests  that  MPS  monitors  daily  maintenance  activities  with 
relative  ease  and  accuracy.  Moreover,  MPS-generated  skill  and  training 
information  is  used  by  shop  personnel  to  diagnose  skill  deficiencies  and 
to  guide  technical  training  of  individual  repairmen.  . 
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Background 


The  proper  and  continuous  maintenance  of  Army  vehicles  and  other 
equipment  is  crucial  to  achieving  combat  readiness.  However,  a 
number  of  reports  have  indicated  that,  at  both  the  organizational  and 
direct  support  level,  equipment  maintenance  procedures  are  less  than 
optimal.  Typical  of  recurring  problems  in  maintenance  operations  are 
these:  faulty  diagnosis,  failure  to  recognize  equipment  failures, 
unnecessary  part  replacement,  and  failure  to  conduct  on-the-job  training. 

In  an  effort  to  help  correct  these  maintenance  deficiencies,  and 
with  the  assumption  that  effective  maintenance  training  helps  ensure 
efficient  maintenance  operations,  the  Army  Research  Institute  (ARi) 
developed  the  rlaintenance  Performance  System  (MPS). 

The  Maintenance  Performance  System 

The  primary  goal  of  MPS  is  to  improve  the  conduct  and  quality  of 
on-the-job  training  of  repairmen  in  their  performance  of  technical  tasks 
at  direct  support  battalions  (a  more  extensive  system  for  use  at  the 
organizational  level  is  now  being  designed) .  To  accomplish  this,  MPS 
provides  unit  management  with  current  information  about  the  experience, 
training  and  skills  of  maintenance  personnel,  and  then  guides  the  manager 
toward  selection  of  available  training  methods  and  materials  to  meet 
skill  and  performance  needs. 

Integration  Into  Normal  Operations 

The  structure  of  MPS  and  its  integration  into  normal  operations  are 
shown  in  Figure  1,  Daily  changes  in  both  equipment  status  and  repairmen 
experience  are  recorded  and  fed  into  a  computer  file;  in  turn,  information 
about  shop  performance  on  an  aggregate  and  individual  level  is  prepared 
in  report  form.  These  reports  from  MPS  are  giver,  to  battalion  and 
company  level  managers  who  use  chem  to  diagnose  maintenance  performance 
deficiencies.  Managers  can  then  use  specific  system-derived  guidance  to 
match  training  resources  with  training  needs  and  to  develop  an  on-the- 
job  training  program. 

It  is  important  to  note  that  MPS  differs  from  the  current  Array 
Maintenance  Control  System  (MCS) .  MCS  is  primarily  designed  to  provide 
information  about  the  status  of  equipment  as  it  passes  through  the  shop 
and,  unlike  MPS,  MCS  provides  no  information  about  individual  and  unit 
skills,  experience,  or  training. 


Description  of_ System 

MPS  has  been  in  use  for  nearly  one  year  by  two  forward  support  companies  of 
a  divisional  FORSCOM  maintenance  battalion.  It  currently  includes  ten  Military 
Occupational  Specialties  CMOS)  and  almost  all  of  the  equipment  serviced  by  the 


companies.  The  MOSs  include  the  high  density  MOSs  of  63H  (track  vehicle 
repairmen) ,  63W  (wheel  vehicle  repairmen),  and  45K  (tank  turret  repairmen). 

Technical  MOS  supervisors  feed  information  into  the  system  through 
a  set  of  five  simple  input  forms.  Based  on  observations  to  date,  super¬ 
visors  spend  about  ten  minutes  each  week  completing  them.  The  information 
required  on  each  form  is  described  in  Table  1. 


TABLE  1.  DATA  COLLECTION  FORMS 


Input 

Form  Title 

1  Job  Order  Status 

2  Job  Performance 

3  Daily  Man-Hour 
Availability 

A  Training/Performance 

Demonstration 

5  Task  Experience 


Use 

One  for  each  job;  records  time  of 
each  change  of  job  status 

Records  amount  of  time  each 
repairman  works  on  each  job 

For  each  repairman,  records 
available  hours,  direct  hours 
and  overtime 

Records  accomplished  training 
or  testing  for  each  repairman 

Repairman  entering  system  report 
prior  experience  on  each  task 


Day  to  day  operation  of  MPS  is  accomplished  by  a  junior  enlisted 
clerk  in  grade  E3  or  E4.  The  clerk  is  responsible  for  collecting  the 
Input  forms,  key  encoding  the  data  into  the  computer  (presently  an  IBM 
5120),  generating  computer-printer  reports,  and  distiibuting  the  reports. 
The  clerk  works  full  time  at  these  duties. 


In  all 
Table  2). 
distributed 
number  and 
other  table 
may  affect 
record  when 
affect  the 


there  are  nine  different  MPS  reports  and  two  tables  (see 
One  table  lists  the  complete  roster  of  each  company  and  is 
to  the  company  office.  It  is  used  to  keep  track  of  the 
type  of  personnel  in  the  unit  on  a  bi-weekly  basis.  The 
contains  a  list  of  comments  about  local  conditions  that 
the  interpretation  of  reports.  For  example,  this  table  would 
training  holidays  occur,  because  training  holidays  would 
time  and  direct  labor  available  for  maintenance. 
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There  are  5  different  reports  which  contain  data  that  reflect  the  opera¬ 
tional  status  of  the  unit  and  these  are  delivered  to  relevant  personnel 
on  a  bi-weekly  schedule.  Maintenance  managers  use  information  about  man¬ 
hour  use,  time-to-repair ,  and  time  in  each  job  status  (i.e. ,  awaiting 
parts,  in-shop,  etc.)  to  locate  deficiencies  in  maintenance  performance 
and  other  problems.  Maintenance  managers  are  guided  in  the  diagnosis  of 
problems  by  an  "Interpretation  Guide"  that  is  specifically  keyed  to  MPS. 

Four  other  reports,  which  are  produced  every  six  weeks,  contain  data 
that  are  particularly  relevant  to  the  analysis  of  training  needs  and  the 
guidance  of  training  activities.  For  example,  the  Individual  Skill  History 
report  contains  a  record  of  each  individual's  experience  on  each  MOS  technical 
task.  Experience  on  a  task  includes  the  job  exposure  that  occurs  when  a 
soldier  is  given  a  task  to  perform,  more  formal  training  that  is  provided 
to  train  or  refresh  skills,  and  demonstrated  proficiency  on  a  performance- 
based  test.  The  Individual  Skill  History  report  is  essentially  an  automated 
procedure  for  recording  the  tasks  that  each  soldier  performs  and  is  much 
like  a  job  book.  It  also  serves  as  the  basis  for  the  Training  Requirements 
Summary  report  which  actually  guides  training  activity.  This  report  lists 
those  individuals  who  have  insufficient  experience  on  each  task  and  ranks 
the  need  for  training  according  to  the  degree  of  experience  and  the  crit¬ 
icality  of  the  task.  Training  managers  can  use  this  report  to  identify 
those  individuals  who  most  need  training  on  each  task. 

TABLE  2.  MPS  REPORT  INFORMATION 

Table  1  Roster 

Table  2  Interpretation  Comnents 

Report  1  Man-hour  Availability  and  Use 

Report  2  Average  Direct  Man-hours  Per  Job 

Report  3  Average  Direct  Man-hours  Per  Job  by 

Equipment  and  Task 

Report  4  Average  Job  Throughput  Time  in  Days 

Report  5  Average  Days  Spent  in  Each  Job  Status 

Report  6  Skill  and  Growth  Indices 

Report  7  Skill  Development  Summary 

Report  8  Individual  Skill  History 

Training  Requirements  Summary 
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Evaluation  of  MPS 


MPS  has  bee"1  in  operation  for  several  months  without  the  assistance 
of  or  supervision  by  ARI.  Several  visits  and  observations  during  this 
time  have  revealed  that  MPS  functions  smoothly  on  a  day-to-day  basis, 
e.g.,  forms  are  routinely  attached  to  job  orders  and  filled  in,  data  is 
collected  and  entered  into  the  computer  regularly,  and  reports  are 
printed  and  distributed  appropriately.  A  more  detailed  analysis  of  MPS 
was  conducted,  however,  to  evaluate  the  accuracy  and  validity  of  MPS 
information,  and  to  determine  the  extent  to  which  MPS  training  reports 
affect  the  conduct  and  concent  of  skill  training  in  the  maintenance  shops. 

The  first  objective  of  this  evaluation  was  to  determine  the  accuracy 
of  information  being  entered  onto  the  more  critical  MPS  forms.  This  was 
done  in  light  of  the  fact  that  MPS  report  interpretation  and  consequent 
command  action  is  predicated  on  confidence  in  the  accuracy  of  MPS 
statistics;  data  input  quality  is  a  fundamental  problem  for  any  such 
system. 

A  second  objective  of  the  evalution  was  to  assess  the  validity  of 
MPS-generated  individual  skill  indices.  The  skill  index,  which  is  assigned 
to  each  repairman,  can  range  from  0  to  ICO  and  it  reflects  the  degree  to 
which  a  repairman  is  proficient  on  those  tasks  covered  by  his  MOS;  the 
skill  index  increases  each  time  a  repairman  works  on  a  job,  receives 
training,  or  passes  a  performance-based  test  on  a  job.  (The  skill  index 
is  essentially  a  single  number  describing  the  content  of  the  Individual 
Skill  History  report. 

Skill  indices  are  an  integral  part  of  MPS  for  a  variety  of  reasons. 

For  battalion  and  company  commanders,  they  point  to  current  strengths 
and  weaknesses  in  overall  maintenance  capability,  and  provide  a  measure 
of  the  growth  in  maintenance  proficiency.  For  shop  supervisors,  the 
skill  index  is  an  easy  reference  to  those  repairmen  who  require  addi¬ 
tional  training.  To  the  individual  repairman  who  receives  a  skill 
growth  report  every  six  weeks,  the  skill  index  can  serve  as  a  source  of 
reward  and  motivation. 

The  third  objective  of  this  evaluation  was  to  determine  the  extent 
to  which  MPS  reports  of  job  experience  are  used  for  job  assignments  and 
for  setting  training  priorities. 


Accuracy  of  MPS  data 


Most  of  the  critical  information  monitored  by  MPS  is  collected  on 
MPS  Forms  1,  2,  and  3.  Hence,  a  sample  of  these  forms  in  each  company 
was  examined  and  checked  for  accuracy. 


MPS  Form  1  (Job  Order  Status)  For  two  companies  and  four  MOSs,  a 
total  of  67  forms  were  examined.  The  information  contained  on  these 
forms  was  compared  with  similar  information  contained  in  the  job  folder 
which  normally  follows  a  job  through  the  shop;  shop  officers  were  also 


questioned  about  data  entries  on  this 
inf oriTitit  ion  c or»tu lur'd  on  MX*S  Fcrin  1  is 


form.  It  was  found  that  the 
generally  consistent  with  that  of 


DA  Form  2407  and  with  verbal  reports  from  shop  officers.  The  most 


frequent  problem  was  an  incorrectly  entered  or  omitted  job  code;  this 
occurred  on  six  of  the  forms  examined. 


MPS  Form  2  (Job/Task  Performance).  A  total  of  18  Form  2s  were 
chosen  across  the  two  companies  and  for  four  MOSs;  at  the  time  of  their 
selection,  these  forms  represented  shop  work  completed  within  the  last 
48  hours.  The  repairman  (and  their  supervisors)  listed  on  these  forms 
were  questioned  about  the  jobs  they  had  recently  performed,  and  the 
length  of  time  each  job  required.  In  each  case,  the  names  listed  on 
Form  2  were  consistent  with  reports  from  repairmen  and  supervisors, 
i.e.,  those  repairmen  who  performed  the  job  received  credit  for  it. 

The  accuracy  of  job  completion  time  data  was  more  difficult  to 
determine,  however.  Since  there  are  occasional  interruptions  in  work, 
and  because  start  and  stop  times  are  not  recorded  with  a  stopwatch,  job 
completion  times  are  generally  rough  estimates.  Based  on  reports  from 
repairmen  and  supervisors,  the  total  job  completion  time  is  typically 
rounded  up  to  the  nearest  hour  or  half  hour,  so  that  the  magnitude  of 
error  increases  as  the  job  completion  time  decreases.  For  this  sample 
of  forms,  however,  there  was  no  instance  in  which  repairmen's  reports  of 
job  completion  time  differed  from  MPS  data  entries  by  more  than  60 
minutes . 

MPS  Form  3  (Daily  Man-Hour  Availability).  Fifty-five  forms  were 
reviewed,  which  represent  both  companies  and  four  MOSs.  From  interviews 
with  shop  suoervisors,  it  is  clear  that  available  man-hour  entries  are 
accurate  but  that  the  direct  man-hour  entries  are  predominantly  inac¬ 
curate.  For  example,  it  is  a  connnon  practice  with  Form  3  to  enter  7 
3  direct  man-hours  for  repairmen  who  worked  only  4  or  1  hours. 

That  direct  man-hour  entries  are  inflated  was  confirmed  by  a  cross-check 
for  Form  3  with  Form  2  (which  records  Job  performance  time  for  each 
repairman) . 

V alidi ty  of  MPS  Skill  Index 

To  assess  the  validity  of  the  MPS  skill  index,  the  indices  were 
compared  with  independently  obtained  job  performance  ratings  of  each 
repairman.  In  short,  each  repairman  tracked  by  MPS  was  rated  by  both 
his  supervisors  and  his  peers.  Ratings  were  made  with  the  use  of  an 
anchored  scale,  where  a  0  performance  rating  indicated  a  repairman  who 
required  constant  and  total  supervision  on  a  job,  50  indicated  that  some 
supervision  was  required,  and  a  performance  rating  of  100  meant  that  the 
repairman  could  perform  a  task  successfully  without  any  supervision. 

Three  ratings  were  obtained  for  each  repairman'  the  first  rating  was  to 
indicate  a  repairman's  general  performance  in  his  MOS ,  and  the  remaining 
two  ratings  were  to  indicate  his  performance  on  two  different  but  speci¬ 
fic  tasks  (which  differed  for  each  MOS).  In  all,  usable  ratings  were 
obtained  from  20  supervisors  and  50  repairmen,  representing  four  MOSs 
(63H,  63G,  52D,  45K)  and  three  companies. 

Pearson  product-moment  correlations  were  calculated  to  determine 
the  relationship  between  MPS  skill  indices  and  performance  ratings;  for 
each  rater,  a  correlation  was  obtained  between  the  performance  ratings 
he  assigned  to  repairmen  and  those  repairmen's  skill  indices.  While 
only  a  handful  (16%)  of  these  correlations  were  found  to  be  significant, 
the  majority  (36%)  of  them  were  positive.  In  all,  these  results  suggest 
a  small  yet  positive  relationship  between  MPS  skill  indices  and  performance 
racings . 


Effects  of  MPS  on  Training/Job  Assignment 


Structured  interviews  were  conducted  with  shop  supervisors,  team 
leaders,  and  selected  repairmen  to  determine  what  effects  MPS  feedback 
has  on  maintenance  training,  job  assignment,  and  morale  or  motivation. 

Although  attitudes  toward  MPS  as  a  training  information  source  and 
guide  were  generally  positive,  the  actual  use  of  MPS  feedback  for  train¬ 
ing  purposes  was  varied  and  apparently  related  to  two  factors,  super¬ 
visory  level  and  MOS  density:  Warrant  officers,  for  example,  use  MPS 
feedback  more  often  than  team  leaders,  and  MPS  reports  play  a  greater 
role  in  larger  maintenance  sections  that  in  smaller  ones. 

MPS  skill  training  reports  were  used  for  training  purposes  in  the 
following  ways:  (1)  to  assign  jobs  to  work  teams,  i.e,  a  team  which  had 
done  some  job  least  often  would  be  assigned  that  job  over  other 
teams,  (2)  to  temporarily  shift  a  repairman  from  one  team  to  another  in 
order  to  gain  experience  on  some  job,  (3)  to  serve  as  a  memory  refresher 
about  which  repairmen  require  training  on  critical  skills,  and  (4)  to 
log  entries  into  job  books. 

Interviews  with  repairmen  revealed  that  individual  skill  history 
reports  had  little  effect  on  morale  or  motivation  for  two  reaons. 

First,  the  skill  history  reports  are  distributed  infrequently  (every  six 
weeks)  and  second,  the  reports  list  which  jobs  have  been  performed  but 
not  the  quality  of  performance. 


Discussion 

The  primary  purpose  of  MPS  is  to  improve  the  conduct  and  quality  of 
on-the-job  maintenance  training.  Toward  this  end,  and  as  a  prototype 
system,  MPS  appears  to  have  worked  well.  It  has  been  shown  to  be  a 
system  in  which  daily  maintenance  activities  are  monitored  with  relative 
ease  and  accuracy.  Moreover,  MPS  provides  unique  maintenance  training 
information  and  guidance  which  is  used  for  training  purposes  by  shop 
pe rsonnel. 

The  results  of  the  evaluation  reported  here  and  other  observations 
made  during  the  past  year  suggest  several  improvements  or  modifications 
of  MPS.  For  example,  modification  of  the  MPS  computer  program  would 
allow  a  calculation  of  direct  man-hours  from  information  ou  MPS  Form  2, 
thus  increasing  the  accuracy  and  reliability  of  the  man-hour  avail¬ 
ability  and  use  reports  (  and  eliminating  all  or  at  least  part  of  MPS 
Form  3).  Another  modification  would  be  to  increase  the  frequency  of 
distribution  of  individual  skill  history  reports,  and  hence  increase  the 
potential  this  feedback  has  for  motivation  and  morale.  One  other  important 
improvement  of  MPS  now  unuer  development  is  a  method  of  reflecting  the 
proficiency  with  which  tasks  are  performed;  currently,  two  repairmen 
will  receive  the  same  amount  of  credit  for  completing  a  task  regardless 
of  the  quality  of  their  work. 

ARI  is  now  working  toward  a  link  between  MPS  and  interactive  video¬ 
disc  instruction.  A  scenario  for  the  near  future  is  this:  After  signing 
on  at  a  computer  terminal,  a  mechanic  is  presented  with  a  summary  of  his 
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current  work  and  training  history.  The  computer  then  identifies  the 
mechanic's  training  deficiencies,  and  directs  him  toward  a  specific 
videodisc  lesson  available  for  use  on  that  terminal.  The  computer 
monitors  the  progress  of  the  mechanic  as  he  completes  the  lesson,  and 
updates  his  training  experience  accordingly. 

Finally,  it  should  be  noted  that  a  maintenance  system  like  KPS  on 
have  an  impact  on  more  than  the  unit  in  which  it  functions.  Specifically, 
maintenance  performance  information  can  be  fed  back  to  the  institutional 
training  base  for  use  in  curriculum  design,  and  it  can  provide  high 
quality  reference  data  to  the  equipment:  design  process. 


1S9 


AD  P001294 


i 


The  Strategic  Weapon  System 
Training  Program 

Part  I  -  Description 

Frank  B.  Braun 

DAT A- DESIGN  LABORATORIES  / '  t 
ABSTRACT 


This  paper  (Part  I  of  a  two  part  paper)  describes  the  Strategic  Weapon  System 
Training  Program  (SWSTP)  used  in  the  Navy’s  POSEIDON  and  TRIDENT  Submarine  Force. 

The  goal  of  the  SWSTP  is  to  provide  current,  comprehensive,  job-related  training 
for  Strategic  Weapon  Systems  personnel  in  support  of  maintaining  the  SWS  in  a 
high  state  of  operational  readiness.  The  POSEIDON  and  TRIDENT  submarine  systems 
are  required  to  operate  in  an  environment  where  the  ships  must  be  completely 
self-sufficient,  yet  maintain  their  weapon  system  in  a  constant  "up"  condition. 
This  requirement  necessitates  trained  technicians  who  can  operate  and  maintain 
their  equipment  without  outside  help. 

The  SWSTP,  developed  and  refined  over  the  past  10  years,  has  five  major  elements: 
Personnel  Performance  Profiles,  Training  Path  Systems,  Curricula,  Personnel 
Qualifications  Guides,  and  the  Personnel  and  Training  Evaluation  Program.  These 
elements  and  the  supporting  organization  are  described. 
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BACKGROUND 


Training  for  the  POLARIS  Fleet  Ballistic  Missile  (FBM)  System  was  reviewed  in 
the  late  1960's  and  was  found  deficient  in  several  respects.  The  actual  train¬ 
ing  requirements  were  confusing,  training  pipelines  were  excessively  long,  over¬ 
training  was  common,  trained  personnel  often  were  unfamiliar  with  current  techni¬ 
cal  publications,  and  no  adequate  method  existed  for  evaluating  training  effective¬ 
ness. 

To  correct  these  problems,  the  Chief  of  Naval  Operations  directed  the  establishment 
of  the  FBM  Weapon  System  Training  Program  and  assigned  specific  responsibilities 
for  its  implementation  and  administration.  The  advent  of  the  POSEIDON  and  TRIDENT 
systems  required  further  refinements  to  the  training  program  and  the  name  was 
changed  to  the  Strategic  Weapon  System  Training  Program  (5WSTP) .  The  Strategic 
Systems  Project  Office  (SSPO)  was  designated  to  implement  and  provide  overall 
technical  control  of  the  program. 


program  elements 


The  SWSTP  is  a  systems  appro^n  to  training  composed  of  five  major  elements. 

The  Personnel  Performance  Profiles  (PPPs)  are  comprehensive,  minimum  requirements 
listings  of  the  knowledges  and  skills  required  to  operate  and  maintain  a  system, 
subsystem  or  equipment.  The  PPPs  are  essentially  the  result  of  hardware-oriented 
task  analyses  and  are  prepared  using  current  information  from  approved  engineering 
drawings,  technical  manuals,  training  literature,  contractor  source  information 
and  promulgated  operational,  maintenance,  safety  and  training  doctrine. 

The  PPPs  are  in  the  form  of  numbered  tables  corresponding  to  the  system,  subsystem 
or  equipment  configuration  of  the  SSBN  Weapon  System.  For  example,  PPP  table 
1500  contains  items  which  list  knowledge  and  skills  associated  with  a  Naviga¬ 
tion  Subsystem.  The  individual  system,  subsystem  or  equipment  table  consists 
of  two  parts:  (1)  knowledge  items  representing  the  theory,  characteristics, 
functional  operation  and  procedures  involved  in  the  operation  and  maintenance 
of  the  system,  subsystem  or  equipment;  and  (2)  skill  items  representing  the 
abilities  required  to  perform  operation  and  maintenance  based  on  acquired 
knowledge. 

PPP  items  are  written  to  encompass  all  depths  (levels)  of  coverage.  No  attempt  is 
made  to  list  every  detail  of  every  type  of  knowledge  or  skill  required.  It  is 
the  responsibility  of  the  PPP  user  to  ensure  comprehensive  detail  coverage  in 
maKerial  derived  from  the  PPPs. 

The  Training  Path  System  (TPS)  assigns  the  knowledge  and  skill  items  of  the  PPPs 
to  specific  Navy  personnel  in  a  logical  order  and  to  a  defined  depth  of  knowledge 
and  level  of  skill.  For  example  the  TPS  indicates  which  PPP  table  1500  items 
are  required  in  a  particular  classif ication-e . g. ,  SINS  Technician-and  identifies 
the  courses  which  must  be  taken  to  acquire  those  skills  and  knowledge.  The 
logical  order  of  those  courses  forms  a  training  path.  The  TPS  consists  of  three 
components:  (1)  Training  Objective  Statements  (TOS)  which  define  the  level  of 

training  coverage  for  knowledge  and  skill  items  of  the  PPP  for  specific  Navy 
personnel;  (2)  Training  Path  Charts  (TFC)  which  present  graphically  the  FPF  items 
and  TOS  levels  required  for  each  course  and  the  logical  sequence  of  courses  for 
specific  personnel;  and  (3)  Training  Level  Assignments  (TLA)  which  assign  the 
level  of  training  to  particular  PPP  items  for  specific  personnel  and  also  identify 
the  type  of  training  (i.e..  Background,  Replacement,  Advanced,  and  Onboard).  In 


conjunction  with  the  PPPs,  the  TPS  constitutes  an  effective  management  tool  for 
coordinating  the  development  of  training  materials  and  for  ensuring  adequate  and 
uniform  coverage  of  subject  matter,  with  a  minimum  of  duplication,  in  training 
courses  and  materials. 

Alphanumeric  codes  are  used  in  the  TOS  to  identify  knowledge  and  skill  categories 
and  depth  of  coverage.  The  alphabetic  portion  of  the  code  Indicates  the  knowledge 
and  skill  category: 

Knowledge: 

F  -  Familiarization  Theory 

T  -  Theory 

Skill: 

0  -  Operation 

P  -  Preventive  Maintenance 

C  -  Corrective  Maintenance 

M  -  Maintenance 

The  numeric  portion  of  a  code  will  be  1,  2,  or  3  (3  is  the  highest  level  reached 
in  any  knowledge  or  skill) .  TOS  "TO"  is  an  exception  and  refers  to  the  back¬ 
ground  knowledge  and  skills  usually  taught  In  "A"  School.  An  example  of  a  training 
level  is  "C2,"  which  refers  to  advanced  corrective  maintenance  skills  used  in 
undocumented  fault  isolation  procedures  performed  with  test  equipment. 

The  TOS  for  knowledge  and  skills  are  grouped  into  functional  task  sets  according 
to  three  categories  of  personnel:  CO/XO  (Coordinate),  officer  (except  CO/XO) 
(Direct),  and  technicians  (Perform).  Related  statements  should  be  read  as  a  group 
to  more  clearly  understand  each  of  the  statements,  the  Increasing  coverage,  and 
the  differentiation  between  statements.  To  provide  a  better  understanding  of 
the  TOS,  the  following  diagrams  Illustrate  the  increases  in  level,  the  relation¬ 
ships  between  TOS  levels,  and  the  paths  of  training. 
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Note  that  in  each  diagram,  a  knowledge  statement  (TOS  level,  Tl,  T2,  or  T3)  Is 
followed  by  one  or  more  skill  statements  before  another  knowledge  statement  is 
required.  In  other  words,  each  knowledge  supports  one  or  more  skills:  Tl  supports 
01;  T2  supports  02,  03,  PI,  P2,  and  Cl  (02  and  Ml  for  officers);  and  T3  supports 
C2  and  C3.  In  addition,  it  should  be  understood  that  Tl  is  supportive  to  T2  and 
T2  is  supportive  to  T3,  although  this  is  not  explicitly  indicated  in  the  diagram. 
Construction  of  the  diagrams  in  this  manner  allows  diagrams  to  be  modified  to 
reflect  specific  training  requirements  for  particular  personnel. 

The  entire  diagram  or  a  modified  diagram  would  be  applicable  to  each  PPP  cable 
indicated  as  being  part  of  the  training  coverage  for  particular  personnel.  The 
entire  diagram  reflects  the  training  requirements  for  personnel  associated  with 
all  operation,  preventive  maintenance,  and  corrective  maintenance. 

A  TPC,  the  second  component  of  the  TPS,  consists  of  four  parts:  (1)  table 
Assignment  Matrix,  (2)  PPP  Table  Index,  (3)  Table  Assignment  Chart,  and  (A) 
Curricula  Index.  A  separate  TPC  is  presented  for  each  group  of  POLARIS,  POSEIDON, 
and  TRIDENT  COs/XOs,  officers  and  SWS  technicians. 

The  Table  Assignment  Matrix  indicates  the  particular  PPP  coverage  (by  PPP  table 
number)  required  for  training  the  particular  officer  group  or  type  of  technician. 
The  table  assignment  matrix  also  indicates  the  TOS  levels  of  training  required 
fcr  each  listed  PPP  table. 

The  PPP  table  index  provides  the  PPP  table  name  corresponding  to  each  table 
number.  The  Table  Assignment  Chart  provides  a  graphical  presentation  of  a  complete 
training  pipeline  for  a  category  of  personnel.  The  chart  identifies  the  areas 
of  knowledge  and  skills  that  personnel  of  a  specific  category  are  expected  to 
acquire  during  their  training  life.  If  required,  the  chart  may  also  identify 
entry  or  exit  points  to  or  from  another  training  path.  The  TPC  Indicates  three 
categories  of  training:  Background,  Replacement/Conversion,  and  Advanced /Onboard 
Training.  The  Background,  Replacement,  Conversion,  and  Advanced  training  is 
conducted  in  formal  courses  of  study  (curricula).  The  table  assignment  chart 
specifies  the  PPP  coverage  provided  by  each  curriculum.  Onboad  training  require¬ 
ments  are  also  specified  by  blocks  indicating  particular  PPP  coverage. 

The  curricula  index  presents  the  titles  of  all  formal  curricula  associated  with 
a  particular  table  assignment  chart.  The  required  TOS  levels  to  be  covered  by 
each  curriculum  are  also  listed. 

The  TLA,  the  third  element  of  the  TPS,  indicates  those  required  PPP  items 
(knowledge  and  skills)  and  their  assigned  levels  of  training  for  each  group  of 
Strategic  Weapon  System  officers  and  each  Strategic  Weapon  System  technician. 

The  TLA  further  indicates  the  relationship  of  the  PPP  table  items  to  the  TOS 
codes.  The  PPP  table  knowledge  categories  1-1  through  1-6  are  directly  related 
to  the  TOS  codes  F  and  T,  and  the  skill  categories  2-1  through  2-3  are  directly 
related  to  the  codes,  0,  P,  C,  and  M. 

The  groups  of  officers  and  technicians  are  the  same  as  those  described  for  the 
TPC;  therefore,  for  each  TPC  there  is  a  corresponding  TLA. 

Curricula  is  the  third  major  element  of  the  SWSTP,  The  curricula  are  designated 
either  formal  or  informal  and  are  designed  to  accurately  reflect  the  training 
requirements  identified  in  the  TPS.  Formal  curricula  are  used  in  training 
facilities  ashore  to  provide  background,  replacement,  conversion,  or  advanced 
training.  They  include  Instructor  Guides,  Trainee  Guides,  and  written  and 
performance  tests. 


The  Instructor  Guide  is  the  primary  element  of  every  curriculum.  It  establishes 

the  detailed  course  learning  objectives,  sequences  the  presentation  of  information, 
and  programs  the  use  of  all  other  curriculum  elements. 

Informal  Training  is  received  on  board  submarines  and  tenders  or  ashore  outside 
the  formal  classroom.  Informal  training  is  accomplished  either  through  work  under 
the  supervision  of  more  experienced  SWS  personnel  or  through  use  of  instructional 
media  such  as  Self-Study  Workbooks,  Lecture  Guides  (programmed  slide  presentations), 
and  videotapes.  Self-Study  Workbooks  are  specifically  designed  for  use  on  a  self- 
paced  basis  and  cover  topics  at  system,  subsystem,  and  equipment  levels.  Informal 
training  is  also  used  to  train  individuals  on  new  equipment  or  equipment  altered 
from  its  original  configuration. 

Curricula  materials  are  acquired  and  maintained  through  a  comprehensive  system 
of  stage  submittals  and  review  which  are  designed  to  ensure  the  use  of  curricula 
which  reflect  current  training  requirements  on  the  theory,  maintenance,  and 
operation  of  the  applicable  system,  subsystem  or  equipment,  and  in  the  proper  use 
of  associated  documentation  and  procedures. 

The  Personnel  Qualification  Guides  (PQGs)  are  promulgated  by  the  Submarine  Force 
Commanders.  The  PQGs  identify  specific  knowledge  and  skill  requirements  or 
standards  that  must  be  met  by  personnel  to  "qualify"  or  "requalify"  for  various 
vatchs tat ions  on  board  a  submarine  or  tender.  These  qualifications  differ  from 
training  in  that  they  require  a  specific  demonstration  of  ability  using  the 
actual  equipment  in  its  operational  environment  after  appropriate  training  has 
been  completed.  Replacement  training  provides  the  necessary  knowledge  and 
skills  to  pursue  those  qualifications  requiring  the  least  experience. 

The  Personnel  and  Training  Evaluation  Program  (PTEP)  is  the  final  element  of  the 
SWS  Training  Program  and  is  the  quality  assurance  element.  The  PTEP  measures, 
evaluates,  and  reports  on  the  effectiveness  of  the  program.  It  is  designed  to 
assist  overall  training  management  by  providing  a  capability  for  training  monitor¬ 
ing,  evaluation,  feedback,  and  improvements.  PTEP  accomplishes  its  objectives  by 
means  of  personnel  testing,  collection  of  test  and  nontest  data,  evaluation,  and 
reporting.  Both  test  and  nontest  data  are  evaluated  to  determine  trends,  identify 
deficiencies,  and  formulate  recommendations  for  corrective  action.  Evaluation 
results  are  promulgated  in  various  types  of  reports  which  are  distributed  to 
assist  commands  in  increasing  personnel  proficiency  and  in  implementing  improve¬ 
ments  to  the  SWSTP. 

Tests  are  administered  periodically  to  personnel  from  the  time  they  enter  train¬ 
ing  until  they  leave  the  system.  The  PTEP  utilizes  standardized  tests  referred 
to  as  System  Achievement  Tests  (SATs)  and  Course  Achievement  Tests  (CATs)  for 
individual  and  training  program  assessments. 

SATs  are  administered  periodically  to  all  technicians  in  the  SWSTP  to  measure 
overall  system  knowledge  and  skill  levels.  They  are  also  given  at  the  completion 
of  replacement  training  and  are  based  on  the  PPPs  and  TPS  for  each  technician 
group.  The  primary  purpose  of  SATs  is  to  evaluate  the  training  system.  Their 
secondary  role  is  to  assess  an  individual's  knowledge  and  skill  and  those  of  his 
crew. 


CATs  are  administered  at  the  end  of  each  major  instructional  phase  and  at  the 
completion  of  a  formal  course.  The  test  cortent  is  defined  by  the  knowledge  and 
skill  requirements  of  the  PPPs  ar.d  TPs  as  contained  in  the  Instructor  Guide  (IG). 
Reports  of  tests  are  provided  to  training  commands  to  assist  them  in  improving 
instruction. 


PTEP  Data  Collection  (nontest)  is  the  process  of  collecting  and  maintaining  infor¬ 
mation,  other  than  test  results,  to  provide  the  basis  for  test  validity  studies, 
group  performance  evaluations,  and  other  evaluations  of  the  SWS  Training  Program 
elements.  Nontest  data  include  personal  history  data,  group  performance  data, 
the  status  of  training  facilities/hardware/documentation,  and  other  pertinent 
statistics. 

The  PTEP  evaluation  component  utilizes  all  accumulated  data  to  measure  the  effec¬ 
tiveness  of  the  training  program.  Analyses  are  conducted  to  measure  individual 
and  crew  knowledge  and  skill  levels,  curriculum  adequacy,  and  hardware  reliability. 
The  analyses  are  designed  to  identify  and  verify  deficiencies  within  the  train¬ 
ing  program  and  formulate  recommended  corrective  actions. 

Evaluations  are  normally  initiated  on  the  basis  of  indications  from  test  results. 

In  certain  circumstances,  specific  evaluations  may  be  initiated  as  a  result  of 
tactical  hardware  performance,  reporting  requirements,  particular  inquiries,  or 
changes  in  personnel  requirements,  tactical  hardware,  training  facility /hardware 
documentation,  etc. 

PTEP  provides  a  variety  of  external  reports  based  upon  the  analysis  of  test  and 
nontest  data.  These  reports  are  designed  specifically  to  assist  comnands  in  assess¬ 
ing  and  improving  personnel  proficiency  and  the  SWSTP.  The  most  widely  used 
PTEP  reports  are:  (1)  Group  System  Achievement  Test/Group  Course  Achievement  Test 
Reports  which  present  a  "quick-look"  analysis  of  the  results  of  personnel  testing. 
They  include  the  current  test  scores  for  each  personnel  category,  individual  scores 
(for  both  the  overall  test  and  for  individual  knowledge  and  skill  areas  within  the 
test) ,  and  an  analysis  of  group  performance  in  each  area.  (2)  Test  Evaluation 
Reports  (TERo)  which  contain  the  results  of  a  statistical  analysis  of  each  test 
item,  each  skill  test  exercise,  and  each  knowledge  and  skill  area.  Data  from 
TERs  provide  information  about  tests,  in  general,  and  test  items,  in  particular 
that  is  valuable  ip  the  design  and  development  of  replacement  tests.  (3)  The 
PTEP  Status  Report  is  a  semi-annual  summary  of  PTEP  testing  and  evaluation  results. 
It  is  distributed  to  activities  which  have  an  interest  in  the  current  status  of 
the  training  program  but  which  are  not  directly  concerned  with  program  management. 
(4)  Evaluation  Reports  and  Special  Investigations  which  are  developed  when  a 
training  deficiency  is  identified.  They  contain  information  related  to  the 
deficiency,  provide  pertinent  background  data,  and  recommend  corrective  action. 


SWS  TRAINING  SEQUENCES 

Training  sequences  are  established  by  manpower  resource  considerations  and  system 
requirements.  The  levels  of  tiaining  established  by  the  TPS  for  FI,  TO,  Tl,  Ml, 

Cl,  01,  PI,  T2,  and  02  are  normally  accomplished  at  the  Replacement  Training  sites. 
All  other  levels  of  training  are  normally  accomplished  in  the  formal  school  environ¬ 
ment  at  the  Advanced  Training  sites  and/or  the  On-Board  Training  (OBT)  environment 
for  SSBN  personnel.  All  Training  established  by  the  TPS  for  SWS  Tender  personnel 
will  normally  be  accomplished  at  the  Replacement  Training  site.  Tender  training 
which  cannot  be  conducted  in  replacement  training  will  be  accomplished  in  the 
SWS  Tender  OBT  environment.  Typical  training  sequences  for  submarine  officers 
and  technicians  are  depicted  below. 


19b 


TRAINING  SEQUENCES 

SSBM  OrnCEBS 


TV9  FIT*** 

isusimi 
•firm  Nfm 


LI 


eo/xa 

WEAPONS 

DEPARTMEIT 

WAD  | 

NAVIGATION 

DEPARTMENT 

HEAD 

mum 

•hmw  sum 

Mill  UNI  UHU 

turns  irrei 

CHOI 

unuiitv  nirii 

(MW 

—  »  _  1 

ItM  II 


MtIH  Turns 
— . 1  ' 


NiiMi  ■  smiraii 


tHiiii  mm 

I 


SWSTP  ORGANIZATION 

The  SWS  Training  Program  organization  is  structured  to  carry  out  the  policy 
objectives  of  the  program.  The  training  program  organization  is  depicted  in  the 
organization  chart  on  Figure  (1).  The  CNO  provides  overall  policy  guidance, 
approves  changes  and  establishes  requirements  for  training  and  evaluation.  The 
Fleet  CINCs,  Chief  of  Naval  Material,  and  Chief  of  Naval  Education  and  Training 
implement  the  SWSTP  through  their  respective  organizations.  Chief  of  Naval 
Technical  Training  establishes  formal  training  courses  to  support  the  require¬ 
ments  of  the  TPS  and  provides,  in  conjunction  with  SSPO,  the  specifications  and 
procedures  for  planning,  developing,  implementing  and  maintaining  the  programs 
training  material.  SSPO  provides  and  maintains  current  the  program  elements 
for  the  SWSTP  and  provides  technical  coordination  and  support  to  the  Central 
Test  Site  for  PTEP  (CTS) .  CTS  conducts  the  evaluation  tests  through  CTS 
Detachments  at  the  SSBN  off-crew  sites.  CTS  prepares  the  tests,  analyzes  and 
evaluates  the  data,  prepares  reports  and  recommendations  for  SWSTP  improvements. 
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The  Submarine  Force  Commanders  develop  and  promulgate  the  SWS  personnel  Qualifi¬ 
cation  Guides  and  provide  requirements  and  recommendations  for  OBT  materials  to 
SSPO. 

The  SWS  Training  Facilities  (TF)  conduct  formal  courses  and  act  as  Material  Pre¬ 
paring  Activities  (MPA)  and  as  Material  Support  Activities  (MSA)  for  the  SWSTP 
Materials  Management  task. 


SWSTP  MATERIALS  MANAGEMENT 


The  SWS  Training  Program  materials  include  management  materials,  curriculum 
materials,  and  instructional  media  materials.  The  SSBN  Weapon  System  Training 
Materials  Management  Plan  is  designed  to  provide  timely,  adequate,  and  accurate 
training  materials  for  the  SWSTP.  The  organizations  shown  in  Figure  (1)  perform 
vital  functions  in  the  development  and  support  of  SWSTP  materials  in  the  following 
roles: 

The  Chief  of  Naval  Technical  Training  (CNTECHTRA)  as  the  Training  Agency  (TA) 
for  the  SWSTP  exercises  command  of  and  provides  support  to  the  formal  training 
effort. 

SSPO  as  the  Training  Support  Agency  (TSA)  is  responsible  for  supporting  the  TA 
by  providing  material  and  other  forms  a f  support  within  its  cognizance. 

The  Personnel  Program  Coordinator  (PPC)  is  the  organization  responsible  for 
planning,  designing,  and  providing  a  fully  operational  personnel  subsystem 
under  direction  of  the  TSA.  The  PPC  provides  coordination  and  review  of  training 
material  development  and  support  as  directed  by  the  TSA.  This  role  is  performed 
by  a  non- hardware  contractor  for  the  SWSTP. 

A  Materials  Change  Activity  (MCA)  is  an  organization  that  develops  changes  to 
training  materials,  as  directed  by  the  TA  or  TSA.  MCAs  for  the  SWS  Training 
Program  are  either  contractors  or  Navy  activities  (e.g.,  TFs),  Materials  Preparing 
Activities  (MPAs)  develop,  revise,  and/or  produce  training  materials  as  directed 
by  the  TSA  or  TA.  MPAs  in  the  SWSTP  are  either  contractors  or  Navy  activities 
(e.g.,  TFs).  Materials  Support  Activities  (MSAs)  are  responsible  for  training 
materials  surveillance  as  directed  by  the  TSA  or  TA.  MSAs  in  the  SWS  Training 
Program  are  either  contractors  or  Navy  activities  (e.g.,  TFs). 

SSPO,  in  its  role  as  the  TSA,  has  developed  a  "family"  of  contractors  to  support 
the  SWSTP.  This  group  of  contractors,  both  hardware  and  engineering  services, 
have  developed  an  expertise  over  many  years  in  the  program  which  allows  them  to 
respond  quickly  and  comprehensively  to  new  or  changed  program  requirements. 

The  training  materials  management  program  consists  of  the  techniques  and  procedures 
for  acquiring  and  supporting  training  materials.  The  development  and  support  of 
each  of  the  three  types  of  training  materials  are  discussed  in  the  following 
paragraphs. 

Management  materials  consist  of  documents  which  contain  the  PPPs  and  the  TPS  and 
program  management  documents  which  contain  specifications  and  procedures.  PPPs 
and  TPS  are  developed  by  contractor  and  training  facility  MPAs,  coordinated  and 
reviewed  by  the  PPC,  and  approved  by  the  TSA  and  TA.  Management  procedures  and 
materials  specification  documents  are  developed  and  supported  by  the  PPC,  under 
the  direction  of  the  TSA,  reviewed  by  program  participants,  and  approved  by  the 
TSA. 
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Curriculum  materials  consist  Of  Instructor  Guides  and  Trainee  Guides  or  instruc¬ 
tion  sheets.  Curriculum  materials  are  developed  by  contractor  MPAs  or  training 
facility  MPAs,  coordinated  and  reviewed  by  the  PPC,  and  reviewed  and  approved 
by  the  TA  and  TSA. 

Instructional  media  materials  are  slides,  overhead  transparencies,  Self-Study 
Workbooks,  programmed  super  8mm  presentations.  Exercise  Controller  Guides, 
Administrator's  Guides,  and  Lecture  Guides.  These  materials  are  developed  largely 
by  contractor  MPAs,  but  TFs  may  also  produce  transparencies  and  other  materials. 
The  PPC  performs  coordination  and  review  functions  with  regard  to  media  materials 
development  and  support.  If  the  materials  are  developed  for  use  in  the  formal 
environment,  they  must  be  approved  by  the  TA  with  TSA  concurrence.  If  they  are 
developed  for  use  in  the  informal  environment,  they  are  approved  by  the  TSA  only. 
Since  most  instructional  media  materials  are  changed  by  revision  only,  the  change 
procedure  is  identical  to  ti.e  development  procedure. 

All  training  materials  are  maintained  current  and  accurate  by  surveillance  and 
change  efforts.  Surveillance  is  conducted  by  MSAs  to  detect  those  changes  in 
documentation,  equipment,  or  procedures  that  impact  the  training  materials.  When 
the  requirement  for  a  change  is  identified  by  a  contractor  MSA,  a  TMCR  (Train¬ 
ing  Materials  Change  Recommendation)  is  prepared  and  forwarded  to  the  PPC.  Defic¬ 
iencies  in  training  materials  identified  by  Navy  activities,  such  as  training 
facility  MSAs,  will  be  reported  by  the  use  of  a  TFR  (Trouble  and  Failure  Report) 
or  a  letter  to  the  TSA.  In  all  cases,  the  surveillance  and  reporting  of  train¬ 
ing  materials  deficiencies  are  coordinated  by  the  PPC. 

Once  the  need  for  a  change  has  been  identified  and  reported,  an  interim  change, 
change,  or  revision  may  be  initiated.  Interim  changes  and  changes  are  prepared 
by  MCAs  find  revisions  are  prepared  by  MPAs.  The  type  of  change  required  depends 
on  the  magnitude  and  urgency  of  the  change. 


SUMMARY 


The  operational  readiness  of  the  fleet  is  highly  dependent  upon  well-trained 
officer  and  enlisted  personnel.  As  new  weapons  and  delivery  systems  become 
more  sophisticated  and  complex,  the  need  for  adequate  and  cost-effective  train¬ 
ing  programs  becomes  more  acute.  An  effective  training  program  requires  accurate 
Identification  of  training  requirements,  uniform  training  for  personnel  at  all 
sites,  and  3  timely  method  for  updating  materials.  The  SWS  Training  Program, 
utilizing  a  systems  approach  and  constant  surveillance,  meets  this  need  for 
the  Weapons  and  Navigation  personnel  on  our  strategic  submarines. 
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Braun,  Henry  I.  S  Jones,  Douglas  H. ,  Educational  Testing  Service,  Prince¬ 
ton,  New  Jersey.  (Thurs.  P.M.) 


Empirical  Bayes  Assessment  of  Differential  Validity 


An  important  problem  in  the  assessment  of  test  validity  arises 
when  the  population  under  consideration  consists  of  several  distinct 
subgroups  distributed  in  varying  proportions  across  a  number  of  train¬ 
ing  sites  or  programs.  Separate  prediction  equations  for  each  group  at 
each  site  may  be  required,  particularly  if  there  is  evidence  of  differ¬ 
ential  validity.  The  authors  describe  and  illustrate  a  technique, 
based  on  the  empirical  Bayes  paradigm,  which  provides  estimates  of  the 
different  prediction  equations  even  when  the  members  of  a  particular 
group  are  sparsely  scattered  across  sites  so  that  unique  least  squares 
estimates  are  not  available.  There  is  no  limit  to  the  number  of 
groups,  sites,  or  explanatory  variables  that  can  be  considered. 


Unbiased  Predictions  in  Sparse  Data  Problems 


Henry  I.  Braun  and  Douglas  H.  Jones 
Educational  Testing  Service 
Princeton,  New  Jersey 


The  Graduate  Management  Admissions  Test  (GMAT)  plays  an 
Important  role  in  the  admissions  decisions  made  at  many  graduate 
management  schools.  A  candidate's  GMAT  sccre  is  often  combined 
with  his/her  undergraduate  grade  point  average  in  a  multiple 
regression  equation  to  obtain  a  predicted  first  year  average. 

This  predicted  score  is  but  one  component  in  the  evaluation  of 
the  candidate's  suitability  for  study.  It  is  extremely  important 
that  these  predictions  be  fair;  that  is,  that  no  discernible  group 
of  candidates  have  their  predicted  scores  consistently  higher  or 
lower  than  their  achieved  scores.  This  notion  of  "fairness  to  the 
group"  is  only  one  way  of  posing  the  problem  of  prediction  bias. 
Swinton  (1981)  provides  an  extensive  discussion  of  other  possible 
formulations  as  well  as  a  lucid  survey  of  the  literature  on  the 
subject . 

One  purpose  of  this  study  has  been  to  assess  the  quality  of 
the  predictions  of  first  year  averages  in  graduate  management 
school  when  a  single  prediction  equation  is  employed  for  both 
black  students  and  white  students  at  a  given  school.  Our  analysis 
shows  that  prediction  bias  does  exis:  in  the  sense  that  a  single 
equation  yields  predictions  for  minority  students  with  low  pre¬ 
dictor  scores  which,  on  average,  tend  to  be  somewhat  higher  than 
the  criterion  scores  they  actually  achieve.  This  finding  is 
consistent  with  much  of  the  previous  research  in  this  area  (Rolph, 
et  al.,  1978;  Swinton,  1981). 

A  second  goal  has  been  to  develop  a  new  bias-free  prediction 
system  based  on  GMAT  scores  and  undergraduate  grade  point  average 
(UGPA) .  The  proposed  system  explicitly  employs  race  as  a  predictor. 
Although  there  are  no  statistical  difficulties,  the  ethical  and 
perhaps  the  legal  propriety  of  this  approach  is  not  well  established. 

Because  of  the  relatively  few  black  registrants  in  our  data 
base,  it  would  be  virtually  impossible  for  the  conventional  least 
squares  procedures  to  provide  parameter  estimates  for  the  type  of 
model  we  propose.  We  have  employed,  therefore,  the  empirical  Bayes 
methodology  (Maritz,  1970).  Although  several  technical  difficulties 
have  had  to  be  overcome,  we  have  succeeded  in  obtaining  useful 
parameter  estimates  that  should  prove  fairly  stable  over  time. 


The  Data  ana  Evidence  for  Prediction  Bias 


This  study  is  based  on  data  collected  at  fifty-nine  business 
schools  in  the  academic  years  1978-79  and  1979-80.  The  final  data 
base  consists  of  about  8500  records  of  which  approximately  four 
percent  represent  those  of  black  students. 

One  approach  to  the  assessment  of  prediction  bias  is  to 
estimate  separate  prediction  lines  for  majority  and  minority 
students.  However,  as  in  most  studies  of  this  nature,  there  are 
three  obstacles  to  such  a  simple  analysis.  The  first  is  the  compli¬ 
cated  nature  of  the  selection  process  (restriction  of  range  on 
measured  predictors  as  well  as  possible  selection  using  unmeasured 
predictors).  A  second  difficulty  is  the  questionable  cotqparablllty 
of  the  criterion  across  schools.  Finally,  small  minority  sample 
sizes  complicate  the  estimation  process. 

The  scarcity  of  minority  data  is  illustrated  by  the  fact  that 
of  the  59  schools,  43  have  five  or  fewer  blacks  enrolled  per  year 
of  participation  in  the  GHAT  Validity  Study  Service.  These  43 
schools  enroll  about  20  percent  of  the  minority  students.  It  is, 
of  course,  virtually  impossible  to  estimate  an  accurate  prediction 
line  for  a  school  having  only  five  data  points,  especially  with 
three  predictor  variables.  Even  in  the  five  schools  with  20  or 
more  black  students,  restriction  of  range  renders  it  difficult  to 
appraise  the  nature  of  the  prediction  bias  separately  within  each 
school. 

Inasmuch  as  there  are  only  a  few  schools  with  moderate  numbers 
of  black  students,  the  simplest  way  to  determine  the  existence  of 
predictive  bias  is  to  estimate  a  single  multiple  regression  equation 
at  each  school  for  all  the  students.  To  make  the  outcome  variable 
comparable  across  schools,  the  criterion  or  dependent  variable  chosen 
is  the  Z-score  equivalent  of  the  student's  first-year  average.  The 
Z-score  equivalent  is  the  number  of  within-school  standard  deviations 
above  or  below  the  within-school  mean  that  a  particular  score  falls. 

Once  the  estimated  combined  prediction  line  is  obtained  by 
ordinary  least  squares,  a  predicted  Z-score  FYA  is  computed  for  each 
black  student  and  is  subtracted  from  that  student's  actual  Z-score 
FYA  to  obtain  a  residual.  For  conciseness  this  Is  called  black 
residual.  Within  each  school,  the  black  residuals  are  averaged  to 
obtain  a  mean  black  residual.  The  mean  black  residual  indicates  if, 
on  the  average,  the  combined  prediction  line  over  or  under  predicts 
for  the  black  students  in  that  school. 
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The  results  of  the  computations  yield  a  preponderance  of 
negative  mean  black  residuals,  indicating  that  the  single  least 
squares  prediction  line  tends  to  overpredict  FYA  for  black 
students  at  a  large  majority  of  schools.  Further  investigation 
reveals  that  this  apparent  bias  is  not  related  to  the  fact  that 
minority  and  majority  predictor  scores  are  concentrated  in  dif¬ 
ferent  regions.  More  specifically,  the  least  squares  line  leaves 
small,  symmetrically  distributed  residuals  for  white  students 
whose  predictor  scores  are  located  around  the  centroid  of  black 
predictor  scores. 


The  Empirical  Bayes  Method 

In  order  to  overcome  the  difficulties  mentioned  in  the 
previous  section,  particularly  the  problem  of  small  sample  sizes, 
previous  studies  have  employed  various  methods  of  pooling  the 
data.  These  have  generally  consisted  of  a  single  regression 
equation  with  several  parameters  cotmnon  to  all  schools  and  a 
limited  number  of  parameters  for  individual  schools  (Boldt,  1975; 
Tucker,  1963  and  Potthoff,  1964).  In  order  to  produce  stable 
prediction  systems  when  many  small  schools  are  Involved,  the 
individual  school  parameters  are  usually  suppressed  and  a  single 
equation  is  employed  for  all  (Wilson,  1979). 

An  entirely  different  approach,  based  on  Bayesian  principles 
permits  each  school  to  have  its  own  prediction  equation,  even  when 
sample  sizes  are  small.  One  methodology  has  been  developed  and 
implemented  by  Lindley  (1970)  and  Novick,  et  al. ,  (1972).  However 
the  desired  estimates  are  not  easily  obtained  and  their  numerical 
values  are  quite  sensitive  to  the  computational  routines  employed. 
Furthermore,  these  estimates  can  be  substantially  improved  when 
many  parameters  are  involved. 

Motivated  by  Bayesian  ideas,  statisticians  have  also  devel¬ 
oped  the  so-called  empirical  Bayes  techniques  (Robbins ,  1951; 

Efron  and  Morris,  1973)  which  require  reduced  specification  of 
prior  information  —  critical  to  Bayesian  techniques.  Instead, 
the  data  is  allowed  to  determine  the  probable  shape  of  the  prior 
distribution. 

The  principal  benefit  of  empirical  Bayes,  relative  to 
traditional  methods,  is  that  it  yields  parameter  estimates  which 
are  more  stable  through  time  and  better  predictors  for  future 
data  (Rubin,  1980).  More  importantly,  in  this  study  we  have 


exploited  another  aspect  of  the  methodology  that  permits  the 
-estimation  of  separate  pr 'diction  equations  for  whites  and  blacks 
in  each  school,  even  where  the  number  of  blacks  is  too  small  to 
permit  the  classical  estimator  to  be  uniquely  defined.  Moreover, 
it  enq>loys  whatever  information  is  available  for  black  students 
in  a  meaningful  and  plausible  way.  The  following  sections  will 
provide  technical  details  and  numerical  results. 

The  Empirical  Bayes  Model  Prediction 

The  description  of  the  empirical  Bayes  model  Involves  two 
components:  (1)  the  conditional  distribution  of  the  observed 
criterion  scores  given  the  unobservable  parameters  and  (2)  the 
distribution  of  the  unobservable  parameters  themselves.  Before 
proceeding,  we  require  some  notation.  As  before,  i  indexes 
schools  and  j  Indexes  students  within  schools. 


Let 

Y  *  observed  first  year  average  (FYA) , 

Y^  ”  average  FYA  in  school, 

-  sanq>le  size  in  school  i, 

YtJ  -  (y'j  -  Y [r.jff'.j  -  r')2/^]1*-  Standardized  FYA, 

Y  =  score  on  GMAT  (verbal), 

-  score  on  CMAT  (quantitative), 

»  undergraduate  grade  point  average  (UGPA) , 

Y  m  J 1  ,  if  student  is  black 

ij  \  G  ,  if  student  is  white. 

The  first  component  of  the  model  can  be  described  by  saying  that 
given  B^,  tht  observed  standardized  FYA  is  generated 

according  to  the  regression: 


(1) 


YU  •  80i  +  "li'y  +  *  83Aj 


+  Ilj{8il+85iVlj+V!ijt  Vli}  **IJ 
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where  are  independent 
zero  and  unknown  variance 


normally  distributed  errors  with  mean 


o 


i* 


In  this  model,  if  the  jth  student  is  black,  all  eight  terms 
of  equation  (1)  are  employed  in  predicting  his  standardized  score; 
if  he  is  white,  only  the  first  four  terms  are  used,  since  I ^  =  0 

in  that  case.  In  other  words,  the  prediction  equation  for  blacks  is 
obtained  by  appending  to  that  for  whites  an  adjustment  to  the 
intercept  and  coefficients  of  the  three  predictors.  For  example, 
the  coefficient  of  UGPA  in  the  prediction  equation  for  blacks  is 
(B^  +  B^)  ,  whereas  for  whites  it  is  simply  B^. 

Geometrically,  equation  (1)  is  equivalent  to  requiring  two 
different  prediction  planes  in  each  school:  one  for  whites  and 
one  for  blacks.  Until  this  point,  there  has  been  no  mention  of 
parameter  estimation,  and  either  least  squares,  empirical  Bayes 
or  some  other  methodolrgy  could  be  employed.  However,  with  the 
small  minority  sample  sizes  available  here,  least  squares  is 
entirely  impractical.  Hence,  we  augment  (1)  with  some  assumptions 
which  place  the  model  in  empirical  Bayes  framework. 

The  second  component  of  the  model  specifies  that  the  vectors 
“  ^®oi’  Bli**'"’  B7i^  are  a  random  sample  from 

a  multivariate  normal  distribution,  N(u*,E*).  The  parameters  p*  and 
Z*  are  not  specified,  but  rather  are  estimated  from  the  data  under 
the  above  distributional  assumptions  which  seem  reasonable  in  this 
context. 

Even  with  the  normality  assumptions,  maximum  likelihood  esti¬ 
mates  of  v*,  E* ,  and  p ^ ( 1  =  1,  59)  can  be  obtained  in  closed 

form.  An  E-M  algorithm  (Dempster,  Laird,  Rubin  (1977))  provides  a 
relatively  simple  iterative  scheme  for  finding  the  maximum  likeli¬ 
hood  estimates,  as  well  as  estimates  of  their  variability.  The 
latter  are  required  to  assess  the  significance  of  the  differences 
between  regression  coefficients  for  blacks  and  whites. 


Results  and  Conclusions 


Our  methods  and  analysis  are  designed  to  answer  two  questions: 

(1)  Under  the  two-plane  model,  are  the  differences 
between  the  prediction  planes  for  white  and  black 
students  statistically  significant  and  of  practical 
importance? 


(2)  If  practical  differences  do  exist,  are  the 
predictions  for  black  students  superior  to 
those  obtained  under  the  classic  model? 

The  first  question  may  be  addressed  in  two  ways.  One  may 
compare  individual  regression  coefficients  for  black  and  white 
students  and  assess  the  magnitude  of  their  differences.  This 
yields  a  substantial  number  of  statistically  significant  results 
at  the  5  percent  level.  However,  instability  of  individual 
coefficients  suggests  that  the  second  method,  described  below, 
is  preferable.  Here  one  considers  the  differences  in  predictions 
between  the  two  models  at  a  number  of  preselected  points  in  the 
predictor  space.  We  have  chosen  to  focus  on  that  point  with 
coordinates  equal  to  the  mean  black  student  score  on  each  predictor 
at  the  particular  school.  In  effect  we  construct  for  each  school 
a  typical  set  of  predictor  scores  and  ask  by  how  much  the  empirical 
Bayes  predictors  of  average  white  student  performance  and  average 
black  student  performance  differ  at  this  point. 

The  results  are  quite  definitive.  More  than  half  the  dif¬ 
ferences  are  greater  than  a  quarter  of  a  standard  deviation  (of 
the  score  distribution  in  the  school).  About  a  quarter  of  the 
differences  are  statistically  significant  at  the  5  percent  level. 

The  answer  to  question  (1)  appears  to  be  in  the  affirmative. 

Turning  to  the  second  question,  we  find  that  overall  the 
black  residuals  from  the  empirical  Bayes  model  are  symmetrically 
distributed  and  smaller  in  magnitude  than  those  from  the  classical 
model.  Ihe  only  exception  is  the  relatively  small  group  of  black 
students  with  rather  high  predictor  scores.  They  are  somewhat 
better  predicted  by  the  least  squares  line.  This  finding  Is  some¬ 
what  at  odds  with  what  the  "regression  to  the  mean"  theory  would 
suggest  and  deserves  further  study. 

We  have  also  compared  how  well  the  predictions  of  the  two 
models  correlated  with  the  scores  actually  obtained  by  black 
students.  The  empirical  Bayes  model  proved  superior  in  this 
comparison  as  well,  although  the  method  is  not  meant  to  find  the 
best  fit  for  the  particular  data  at  hand.  In  fact,  further  analysis 
shows  the  improvement  is  due  to  adoption  of  the  two-plane  strategy 
rather  than  empirical  Bayes  methodology  per  se.  Of  course,  in  this 
setting  the  two-plane  model  can  only  be  reasonably  fit  by  using 
something  akin  to  Bayesian  methods. 


Although  a  number  of  technical  and  substantive  Issues 
remain  unresolved,  we  believe  that  the  use  of  empirical  Bayes 
methods  enables  the  analyst  to  successfully  construct  fair 
prediction  models  in  such  sparse  data  problems.  Those  in¬ 
terested  in  the  technical  aspects  of  this  approach  should 
consult  Braun  and  Jones  (1981)  . 
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-  v  Development  of  new  scales  from  test  items  usually  requires  several 
hundred  cases  in  each  of  the  defined  groups.  Application  of  multi¬ 
variate  techniques  to  existing  test  scales  also  encounters  such  prob¬ 
lems  as  multicolinearity,  combining  ipsative  with  normative  measures 
and  ordinal  with  equal  interval  scales,  differential  prediction  and 
communication  of  multiple  scale  scores  to  untrained  users.  To  meet 
these  problems  a  method  was  devised  whereby  multiple  scales  from 
different  tests  could  be  treated  as  items,  yielding  new  scales  with 
high  validities  (several  in  the  .90’s)  and  normal  to  negligible  shrink¬ 
age,  even  when  developed  on  relatively  small  groups.  The  results  were 
well— received  by  cadets,  counselors,  and  educators,  and  for  two  years 
have  been  incorporated  as  a  regular  part  of  the  cadet  academic  coun¬ 
seling  program.  The  statistical  development  procedures  and  the  results 
are  illustrated  using  the  Strong-Campbell  Interest  Inventory  based 
academic  field  -Interest  Profile  Similarity  Scales. — ^ 


211 


gjggjgggggllgi 


A  NEW  APPROACH  TO  DEVELOPING  TEST  SCALES1 


Mr.  Claude  F.  Bridges,  Research  Psychologist 

Office  of  the  Director  of  Institutional  Research 
United  States  Military  Academy 
West  Point,  New  York  10996 


ABSTRACT 

Development  of  new  scales  from  test  items  usually  requires  several  hundred 
cases  in  each  of  the  defined  groups.  Application  of  multivariate  techniques 
to  existing  test  scales  also  encounters  such  problems  as  multicolinearity, 
combining  ipsative  with  normative  measures  and  ordinal  with  equal  interval 
scales,  differential  prediction  and  conmunication  of  multiple  scale  scores  to 
untrained  users.  To  meet  these  problems  a  method  was  devised  whereby  multiple 
scales  from  different  tests  could  be  treated  as  items,  yielding  new  scales 
with  high  validities  (several  in  the  . 90's)  and  normal  to  negligible  shrinkage, 
even  when  developed  on  relatively  small  groups.  The  results  were  well-received 
by  cadets,  counselors,  and  educators,  and  for  two  years  have  been  incorporated 
as  a  regular  part  of  the  cadet  academic  counseling  program.  The  statistical 
development  procedures  and  the  results  are  illustrated  using  the  Strong- 
Campbell  Interest  Inventory  based  academic  field  "Interest  Profile  Similarity 
Scales." 


PREFACE 

The  former  title  was  "New  Scales  from  Old,"  It  was  changed  because  it 
seemed  to  be  too  "cute"  and  connoted  only  one  of  the  possible  applications. 
However  it  is  a  succinct  presentation  of  one  major  application  of  the  concept. 
Many  well -developed  standardized  tests,  such  as  the  Minnesota  Multiphasic  Per¬ 
sonality  Inventory  and  the  Strong-Campbell  Interest  Inventory  (SCII),  have  so 
many  scales  that  commercial  computer  scoring  services  are  almost  essential, 
even  if  the  Keys  were  available.  Adding  new  scales  based  on  new  item  keys  for 
other  criterion  groups  or  for  special  purposes  commonly  is  impractical,  and 
always  requires  a  large  number  of  cases. 

Even  when  enough  cases  are  available  in  new  criterion  groups  for  use  of 
the  usual  multivariate  techniques,  some  very  difficult  problems  often  are  en¬ 
countered.  When  independent  variables  from  several  sources  are  to  be  combined, 
the  different  measures  may  be  ipsative  or  normative,  and  may  have  nominative, 
ordinal,  equal  interval,  and  occasionally  even  ratio-type  scales.  Multico¬ 
linearity  of  the  standard  scales  commonly  results  in  failure  to  make  full  use 


Any  conclusions  in  this  report  are  not  to  be  construed  as  official  U.S. 
Military  Academy  or  Department  of  the  .Army  positions  unless  so  designated  by 
other  authorized  documents. 
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of  the  information  available  in  the  data,  since  only  one  of  several  correlated 
scales  may  be  used.  This  results  in  loss  of  precision,  reproducibility,  and 
validity. 

If,  in  effect,  scales  are  treated  as  test  items,  multicolinearity  of 
scales  could  become  an  asset.  When  selecting  items  for  keying  to  form  one 
scale,  desired  multicolinearity  is  handled  by  merely  adding  together  the 
scores  from  each  item,  giving  each  response  the  desired  weight.  This  most 
commonly  is  a  weight  of  one,  but  Claudy  (1978)  recently  showed  that  a  biserial 
weighting  procedure  is  superior.  Treating  standard  scales  in  much  the  same 
way  as  items,  and  capitalizing  on  refinements  thus  possible,  results  in  a 
practical  method  for  developing  new  special  purpose  scales  from  the  regular 
scales.  This  method  has  now  been  applied,  refined,  validated,  and  cross- 
validated,  three  times  at  USMA--the  Academic  Years  of  1978-79,  1979-80,  and 
1980-81,  and  four  reports  prepared  (Bridges:  1979,  1980,  and  1981;  Shuford: 
1980) . 

METHOD 

Overview  of  New  Scale  Development 

a.  Compute,  for  each  variable  (such  as  the  standard  scale  provided  by 
scoring  service),  the  point-biserial  correlation  with  membership  vice  nonmen- 
bership  in  the  criterion  group. 

b.  Select  scales  with  highly  significant  correlations  (usually  the  30  to 
60  most  significant) ;  use  the  z-transformation  value  for  each  correlation  as 
the  weight  (W)  to  be  assigned  to  scores  on  that  standard  scale  (n) ,  and  sum 
their  products  to  obtain  the  new  scale  raw  score  (NSRSC) , 

NSRSC  *  IIU 
n  n 

c.  Compute  the  raw  scores  and  their  mean  and  standard  deviation  for  the 
criterion  scaling  group. 

d.  Use  these  to  determine  the  desired  New  Scale  standard  score  equation, 

NS  Standard  Score  =  10*  (NSRSC  -  Mean  NSRSC) /SD  NSRSC 

Variables 

a.  The  176  scales  from  the  Strong- Campbell  Interest  Inventory  (SCII) 
(Campbell,  1977).  These  included  the  Holland  Theme  Scales,  the  Basic  Inter¬ 
est  Scales,  the  Occupational  Scales,  the  Academic -Orientation  Scale,  the 
Introversion-Extroversion  Scales,  and  the  administrative  scales.  The  SCII  was 
selected  because  of  its  preeminence  as  the  most  extensively  researched  and 
normed  measure  of  an  individual's  pattern  in  various  types  of  interests.  This 
is  the  basic  instrument  used  in  all  the  projects. 

b.  The  36  value  Scales  from  the  Rokeach  Value  Survey  (Rokeach,  1973) . 

These  short,  easy  to  give,  and  well-researched  measures  of  an  individual's 
systems  of  Instrumental  and  Terminal  Values,  previously  had  been  given  to  the 
USMA  Classes  of  1978  and  ’79  in  connection  with  other  research  (Bridges,  1973). 


c.  High  School  Rank  Score.  The  scaled  score  based  on  cadet’s  rank  at 
graduation  from  high  school  was  used  as  an  independent  variable  in  the  first 
project . 

d.  The  cadets'  academic  averages,  by  department,  on  the  courses  they  had 
taken  at  USMA  during  their  first  three  terms  (first  study  only) . 

e.  Each  cadet's  final  cumulative  academic  average  on  the  courses  required 
for  their  academic  area  of  concentration  (in  all  three  projects),  and  for 
their  field  of  study  (in  the  last  two  projects).  In  the  first  project,  each 
of  the  four  Area  QPA's  at  graduation  were  used  also  as  dependent  variables  for 
cadets  in  that  area. 

f.  A  dummy  variable  for  graduation  in  each  area  or  field.  Each  cadet  was 
scored  "l"  if  graduating  in  the  given  area,  or  field,  and  "0"  if  graduating  in 
any  other  area  or  field.  Only  the  four  areas  of  concentration  (Applied  Sci¬ 
ence  and  Engineering;  Basic  Science;  Humanities;  and  National  Security  and 
Public  Affairs)  were  used  in  the  first  project.  Fields  for  which  there  were 
25  or  more  graduating  cadets  were  used  also  as  dummy  variables  for  the  other 
two  projects.  All  of  the  foreign  language  fields  were  also  grouped  together 
as  another  dummy  variable.  The  dummy  variables  were  the  dependent  variables. 

g.  Two  experimental  dummy  variables  suggested  by  Biglan's  model  (1973). 

For  one  dummy  variable,  cadets  in  either  of  the  established  paradigm  ("Physi¬ 
cal  Science")  areas  received  a  score  of  "0."  Those  in  "Nonphysical  Science" 
areas,  a  score  of  "1."  For  the  other  dummy  variable,  cadets  in  the  more 
applied  areas  (AS  6  E  and  NS  6  PA)  were  scored  "1,"  those  in  the  "purer," 
more  basic  areas  (Basic  Science  and  Humanities)  were  scored  "0"  (in  second 
study) . 

Samples 

The  total  samples  included  all  cadets  in  the  respective  graduating  classes. 
The  numbers  varied  in  the  successive  analyses  due  to  missing  data  on  one  or 
more  of  the  variables  involved. 


Class  of 
1978 

Class  of 
1979 

Class  of 
1980 

Total 

844 

843 

909 

Applied  Sciences 

299 

308 

312 

Basic  Sciences 

79 

90 

137 

Humanities 

95 

62 

80 

NS  6  PA 

335 

333 

294 

Management  (Interdisciplinary) 

36 

50 

86 

2J4 


Procedure 


a.  The  AY  78-79  study. 

(1)  The  steps  summarized  in  above  overview  were  used  to  develop  three 
complete  sets  of  scales  to  discriminatingly  characterize  cadets  who  graduated 
in  each  of  the  four  academic  areas  of  concentration  from  all  other  cadets. 

The  equations  for  the  SCI I  Occupation  Interest -based  scales  included  weights 
only  for  point-biserial  correlations  significant  at  the  .001  level.  A  few 
correlations  with  lower  significance  levels  were  used  for  the  SCII-based 
General  Interest  scales  and  for  the  Rokeach  Value-based  scales.  The  three 
sets  of  raw  scores  were  converted  to  standard  scores  with  a  mean  of  50  and  a 
standard  deviation  of  10,  for  cadets  who  had  graduated  in  the  upper  half  of 
the  class  on  their  respective  Area  QPA's.  These  were  called  Area  Compati¬ 
bility  Scores,  though  they  did  not  include  direct  measures  of  the  cadet’s 
differential  abilities  for  the  courses  in  each  area. 

(2)  Using  the  same  procedure,  four  sets  of  raw  score  equations  were 
likewise  developed  with  earned  area  QPA's  as  criteria.  The  set  for  each  area 
included  an  interest  pattern  raw  score,  a  Rokeach  values  pattern  raw  score, 
and  an  early  USMA  academic  performance  raw  score.  The  three  area  raw  scores 
and  high  school  rank  score  were  combined  into  four  regression  equations  that 
yielded  the  most  accurate  prediction  of  each  of  the  four  Area  QPA’s.  These 
were  used  to  produce  for  each  group  Area  Predicted  QPA  Scores,  scaled  to  a 
mean  of  50  and  a  standard  deviation  of  10  for  cadets  in  the  relevant  area. 

(3)  The  equations  for  the  new  scales  developed  on  portions  of  the 
Class  of  1978  were  used  to  compute  area  compatibility  scores  and  four  predic¬ 
tive  area  QPA's  for  the  entire  Class  of  1978  and,  for  cross-validation,  also 
for  the  Class  of  1979.  Since  the  compatibility  scores  were  for  cadets' 
guidance  in  choosing  an  area  of  study,  distributions  of  these  scores  for 
cadets  in  each  area  were  more  important  than  correlations  as  validity  infor¬ 
mation  for  these  scores.  These  data  were  tabled  for  cadets  in  each  half  on 
their  Area  QPA. 

(4)  A  simplified  computer  printout,  listing  the  cadet's  compatibility 
scores  for  each  area,  was  provided  to  a  sample  of  company  academic  counselors 
to  obtain  experience  in  their  delivery  and  use. 

(5)  To  investigate  the  extent  to  which  cadets  currently  were  gradu¬ 
ating  in  areas  closely  conforming  to  their  pattern  of  interests,  values,  and 
academic  abilities,  and  also  to  determine  the  dimensional  structure  of  the 
areas  and  its  conformance  to  Biglan's  C1973)  model,  a  discriminant  analysis 
was  performed  with  the  four  area  predicted  QPA's  as  independent  variables  and 
the  four  areas  as  groups. 

b.  The  AY  79-80  Study. 

(1}  The  independent  variables  were  limited  to  those  from  the  SCII. 
Cadets  graduating  in  the  Classes  of  1978  and  1979  were  combined.  To  the 
dependent  variables  were  added  the  21  fields  of  study  having  25  or  more  cases. 
In  addition,  cadets  in  the  seven  foreign  languages  were  grouped  together. 
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(2)  The  21  field  of  study,  and  the  four  Area  of  Concentration,  Interest 
Profile  Similarity  Scales  were  developed  and  reported  for  all  cadets  in  the 
Class  of  1982.  The  computer  produced  profile  was  used  as  desired  by  the  Aca¬ 
demic  Counselors  during  their  guidance  sessions  with  their  cadets. 

(3)  The  counselor's  responses  to  a  questionnaire  on  all  aspects  of  the 
Academic  Guidance  Program  were  obtained,  analyzed,  and  reported  (see  Bridges, 
1980)  . 


c.  The  AY  80-81  Study. 

(1)  The  graduating  cadets  in  1978,  1979,  and  1980  were  combined  by 
area  and  field  of  study.  Thus,  it  was  possible  to  add  four  more  fields  of 
study,  and  refine  the  few  field  of  study  IPSS  scales  that,  in  comparisons  with 
other  closely  related  fields,  had  been  found  to  yield  interpretative  anomalies 
the  previous  year.  Cadets  graduating  in  Physics  and  those  in  Computer  Science 
had  been  found  typically  to  have  higher  IPSS  scores  for  mathematics  than  for 
their  own  field.  Likewise,  cadets  in  Literature  typically  had  higher  Foreign 
Language  IPSS  scores. 

(2)  The  computer  produced  IPSS  Profile  was  refined  and  used  by  the 
counselors  with  cadets  in  the  Class  of  1983  in  choosing  their  field  of  study. 

(3)  Questionnaire  surveys  of  both  counselors  and  cadets  were  conducted 
for  further  refinements  of  the  materials  and  information  on  their  use  provided 
in  the  Dean’s  Academic  Counseling  Program. 


RESULTS  AND  DISCUSSION 


The  FY  78-79  Study 

a.  Validity.  An  exceedingly  long  paper  would  be  required  to  present  all 
of  the  relevant  useful  results.  We  can  only  hit  some  of  the  highlights. 

(Those  interested  in  more  details  will  find  them  in  Bridges,  1979.)  For 
guidance  in  choosing  one  of  the  four  areas,  the  primary  concern  is  the  rank 
of  each  of  the  four  areas  relative  to  each  other,  rather  than  their  absolute 
magnitude.  Thus,  one  of  the  best  indications  of  their  validity  is  the  extent 
to  which  cadets  successful  in  an  area  tend  to  get  higher  scores  for  that  area 
than  for  other  areas.  Cadets  who  graduated  in  each  area  were  divided  into  two 
groups  on  the  basis  of  their  Quality  Point  Average  at  graduation,  based  only 
on  the  courses  directly  relevant  to  their  Area.  Table  1  presents  the  mean 
score  made  on  sixteen  scales  by  cadets  who  graduated  in  each  half  on  their 
area  specific  QPA.  Note  that  both  groups  of  cadets  in  each  area  characteris¬ 
tically  had  a  significantly  higher  average  on  their  own  area  guidance  scores 
than  on  the  guidance  scores  for  any  of  the  other  areas.  However,  cadets  in 
the  Low-Half  Area  QPA  group  for  Basic  Sciences  tended  to  have  about  the  same 
or  slightly  higher  scores  on  the  scales  for  Applied  Sciences  and  Engineering 
than  on  the  corresponding  scales  for  their  own  area.  Likewise,  Low-Half 
Humanities  area  cadets  had  the  same  or  slightly  higher  NS  S  PA  scores  than 
for  Humanities.  Incidentally,  the  regression  equations  used  in  developing 
the  predicted  Area  QPA's  indicate  that  most  of  the  weight  in  the  predictions 
of  Area  QPA  goes  to  the  cadet's  previous  academic  performance,  but  the  values 
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and  interest  measures  also  contributed  significantly  to  increased  accuracy  of 
predictions  in  all  except  the  Basic  Science  Area.  The  percent  of  total  pre¬ 
diction  of  actual  Area  QPA  contributed  by  each  measure  is  as  follows : 

USMA 


Area 

Academic 

Performance 

H.S. 

Rank 

SCII 

Interest 

Rokeach 

Values 

Applied  Science 

71% 

3% 

14% 

12% 

Basic  Science 

88% 

10% 

- 

2% 

NSPA 

77% 

2% 

11% 

9% 

Humanities 

65% 

- 

21% 

14% 

b.  Generalizability .  Evidence  on  the  generali2ability  to  other  classes 
of  the  scaling  equations  developed  on  a  sample  from  the  Class  of  1978  is  pre¬ 
sented  in  Table  2.  The  percent  of  the  total  in  each  group  ranged  from  about 
10%  for  the  Humanities  Area  groups  to  about  40%  for  NS  8  PA  area  groups.  The 
common  correction,  dividing  the  point  biserial  by  the  maximum  possible  for 
that  proportion,  increases  comparability  between  areas  and  between  scores. 
Except  for  the  Area  scales  based  on  the  Rokeach  Values  data  from  administration 
to  the  Class  of  1979  about  three  years  previously,  the  scales  demonstrated  un¬ 
usual  resistance  to  shrinkage.  These  concurrent  cross-validities  are  even  more 
impressive  when  the  small  number  of  criterion  group  cases  in  some  of  the  devel¬ 
opmental  analyses  for  the  Basic  Sciences  and  Humanities  areas  are  considered. 

c.  Dimensional  Structure  of  Curriculum.  The  Physical  Science/Non-Physical 
Science  Score  results  from  combining  the  cadet's  selected  compatibility  and 
Predicted  Area  QPA  standard  scores  into  one  regression  equation  that  most 
accurately  distinguishes  between  cadets  graduating  in  the  Physical  Science 
areas  (Applied  and  Basic  Science  areas)  and  those  graduating  in  the  other 
areas  (NSPA  and  Humanities).  The  corrected  point -biserial  correlation  between 
the  Physical  Science/Non-Physical  Science  Scores  and  membership  in  the  Physi¬ 
cal  Science  Group  or  the  Non-Physical  Science  group  was  .90  for  the  sample  on 
which  developed,  and  .85  for  the  total  Class  of  1978.  When  cross-validated  by 
using  the  same  equations  for  the  Class  of  1979,  the  correlation  was  .85,  still 
remarkably  high. 

Discriminant  analysis,  using  the  four  areas  as  groups,  and  the  four  area  Pre¬ 
dicting  QPA  Scaled  Scores  as  independent  variables,  provides  confirmation  and 
further  structure  to  be  expected  if  the  USMA  curriculum  conforms  to  the  Biglan 
Model  (Biglan,  1973).  Pertinent  results  follow. 

Standard  Discriminant  Function  Coefficients 
Func  1  Func  2  Func  3 


APQPASS 

+2.143 

+1.177 

+3.407 

BPQPASS 

+0.449 

-0.945 

-4.303 

NPQPASS 

-1.709 

+0.803 

-0.767 

HPQPASS 

-0.278 

-1.840 

+1.731 

Canonical  Correlation 

.  70 

.31 

.13 

Relative  %  Variance 

88% 

10% 

2% 

21S 

Independent 

Variables 


Table  2:  The  Concurrent  Validity  and  Cross-validity  of  the  Seventeen  Scores 

Correlations  between  Each  Zit  of  Scores  and  their  Criterion*  tor  All  Cadets 
Having  Data  on  the  Corresponding  Pair  of  Variables,  in  the  Total  Class  of  1978 
(Maximum  N  =  733)  and  in  the  Class  of  1979  (Maximal  N  =  741)  .  For  the 
dichotossous  dummy  variables,  the  obtained  point  biserial  correlations  have 
been  corrected  to  saae  base. 


Class  of  1978  Class  of  1979 


SCORES 

N 

pt.  bis. 

r. 

N 

rpt.  bis. 

r. 

COMPOSITE  A/B-N/H 

255 

(.674) 

.853 

307 

(.652) 

.826 

PREDICTED  AREA  QPA'S 

APQPA 

115 

(-) 

.860 

187 

(-) 

.823 

BPQPA 

51 

(-) 

.933 

31 

(-) 

.922 

HPQPA 

149 

(-) 

.866 

161 

(-3 

.884 

HPQPA 

AREA  COMPATIBILITY 

37 

(-) 

.916 

24 

(-) 

.862 

STANDARD  SCORES 

SIMILARITY  OF  GENERAL 

INTERESTS 

AGISS 

639 

(.5274) 

.685 

688 

(.4792) 

.614 

BGISS 

632 

(.3744) 

.646 

656 

(.1298) 

.282 

NGISS 

639 

(.5256) 

.528 

648 

(.4675) 

.592 

HGISS 

706 

(.3038) 

.494 

755 

(.1832) 

.331 

SIMILARITY  OF 

OCCUPATIONAL  INTERESTS 

AOISS 

563 

(.4983) 

.647 

580 

(.5633) 

.722 

BOISS 

582 

(.3182) 

.549 

588 

(.1815) 

.395- 

NOISS 

565 

(.5050) 

.643 

587 

(.5552) 

.703 

HOISS 

604 

(.2708) 

.440 

618 

(.1534) 

.279 

SIMILARITY  OF  ROKEACH 

VALUES 

ARVSS 

589 

(.2505) 

.325 

559 

(.0377) 

.048 

BRVSS 

589 

(.2581) 

.445 

559 

(.0358) 

.079 

N'RVSS 

589 

(.2177) 

.277 

559 

(.1040) 

.132 

HRVSS 

589 

(.1718) 

.279 

559 

(-.0043) 

.008 

The  first  function,  which  clearly  is  Biglan’s  Established  Paradigm  dimension, 
dominates,  accounting  for  88%  of  the  explained  variance  between  groups  on  the 
components  of  the  predicted  QPA’s.  The  second  function  conforms  to  his 
"Applied  Science--Basic  Science"  dimension,  and  yields  10%  of  the  explained 
variance.  The  USMA  areas  of  concentration  provide  no  groups  for  Biglan's 
"Living  organism--nonliving"  subject  matter  dimension. 

The  centroids  on  the  first  two  discriminant  functions  for  the  four  groups 
could  conform  to  Guttman's  1954  criteria  for  a  circumplex,  but  with  only  four 
points  one  cannot  tell. 
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The  AV  79-80  Study 


a.  Combining  the  Classes  of  19“8  and  1979  made  possible  ”sharpening-up" 
the  four  Area  Interest  Profile  Similarity  Scales  (IPSS)  as  well  as  the  de¬ 
velopment  of  "Field  of  Study"  IPSS  equations.  The  mean  IPSS  scores  made  by 
the  "high-half"  and  "low-haif"  groups  presented  the  same  general  picture  as 
that  shown  in  Table  1  for  area  interest  based  scales.  In  developing  these 
scales,  a  higher  minimal  level  of  significance  was  possible,  and  earlier 
"General  Interest  SCI  I  scales  were  combined  with  the  Occupation  Interest 
scales  to  produce  only  one  score  per  group.  The  result  was  a  computer- 
produced  profile  for  each  cadet  in  the  Class  of  1982  which  was  distributed 
and  used  by  their  Academic  counselors  when  providing  guidance  on  the  choice 
of  field  of  study.  (For  a  more  detailed  technical  report,  see  Bridges,  1981. 
For  less  technical  documentation,  see  Sii'iford,  1980.) 

b.  More  definitive  information  on  the  basic  dimensions  of  the  USMA  cur¬ 
riculum  resulted  from  the  two  equations  for  an  Applied  Science  Raw  Score 
(APSIRSC)  and  a  Non-Physical  Science  Raw  Score  (NPSCIRSC) ,  based  solely  on 
cadets'  discriminating  interest  patterns.  Correlations  were  computed  between 
cadets'  scores  on  these  two  scales  and  their  IPSS  scores  for  their  own  gradu¬ 
ating  area  and  field.  The  two  were  also  correlated  with  the  dummy  variable 
for  each  of  36  area/field  g*-oups.  The  results  are  depicted  in  Figure  1. 

These  data  and  an  1SS  intercorrelation  matrix  substantiates  conformance  to 
Guttman's  circumplex  criteria  and  to  Biglan's  model. 

Figure  1  indicates  that  much  of  the  basic  simple  structure  of  USMA’s  Areas  of 
Concentration  and  fields  of  Study  seem  to  be  described  by  the  extent  to  whic 
each  is  related  to  both  an  "Applied  Seience--Pure  Science"  dimension  and  a 
"Non-Physical  Science--Physical  Science"  dimension.  The  locations  for  area 
dummy  variables  (01  to  05)  are  largely  as  expected,  and  dunciy  variables  for 
their  fields  tend  to  cluster  around  them. 

Two  other  characteristics  warrant  special  comment.  First,  when  the  point 
plotted  for  each  pair  of  dummy  variable  correlations  is  connected  to  the  cor¬ 
responding  point  for  that  group's  ISS,  the  ray-like  pattern  depicts  sharply 
the  general  similarity  of  the  patterns  of  interrelationships  for  the  two  sets 
of  variables.  Second,  the  circular  structure  of  the  ISS  pattern  depicted  in 
Figure  1  is  known  as  a  circumplex.  The  intercorrelations  confirm  that  in 
general  the  closer  the  points,  the  higher  the  correlations  between  the  ISS 
scores.  Also,  in  accordance  with  Gutman's  criterion  (Gutman,  I9S4)  the  cor¬ 
relations  between  interest  similarity  scale  scores  tend  to  he  negative  for 
ISS's  plotted  directly  across  the  circle  from  each  other. 

The  basic  structure  of  hte  field  ISS's  clearly  shows  close  conformance  to 
their  area's  classifications  in  the  Biglan  model,  with  the  single  exception 
of  the  Economics  field.  The  figure  both  provides  further  indication  that  the 
ISS's  are  valid  and  suggests  further  exploration  of  the  use  of  these  two 
dimensions  to  provide  field-choice  guidance  to  the  cadets  without  even  one 
"interpretative  anomaly."  A  "Map  of  Fields"  (a  graph  on  which  is  plotted  the 
centroids  for  each  field)  could  be  provided  cadets  together  with  their  own 
APSCI  and  NPSCI  scores,  when  The  cadet  plots  his/her  point  on  the  "map,"  the 
fields  which  were  closest  to  his/her  own  interest  pattern  would  be  clearly 
indicated.  This  might  also  be  supplemented  by  a  simple  individual  report  of 
the  cadet's  discriminant  scores  for  each  field.  These  suggestions  need  to  be 
cheeked  further  as  the  use  of  higher-order  factors  for  practical  individual 
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'FIGURE  1:  Correlations 
j  of  APSCI  and  XPSCI  Scores 

I  with  Durmv  Variables  (•) 
and  Area/FAeld  ISS  Scores 
(©). 
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prediction  purposes  often  has  been  found  to  yield  disappointing  results--one 
reason  for  initially  using  it  only  as  a  backup  procedure. 

Incidentally,  Army  Service  Branch  IPSS  scores  developed  from  SCII  scales  for 
officers  in  the  larger  Service  Branches  also  tend  to  form  a  circumplex. 

Factor  analysis  found  two  factors  and  oblique  rotation  indicated  their  nega¬ 
tive  correlation  of  -.52.  The  major  dimension  has  Engineers,  Ordnance,  and 
Signal  Corps  at  one  end,  and  Transportation  Corps  and  Quartermaster  Corps  at 
the  other.  It  seems  to  be  analogous  to  Biglan's  established  paradigm  dimen¬ 
sion.  The  second  dimension  has  Military  Intelligence  at  one  end  and  Field 
Artillery  at  the  other.  This  may  be  Biglan's  "living  organism — inanimate" 
dimension.  It  seems  that  the  Army  service  branches  studied  do  not  have,  as  a 
whole,  enough  of  the  Applied-Pure  differentiation  for  this  dimension  to  show 
up  in  this  factor  analysis.  None  of  the  service  branches  stand  out  as  "pure 
researchers ." 

The  AY  80-81  Study 

Addition  of  the  cadets  graduating  in  the  Class  of  1980  made  possible  the 
development  of  IPSS  equations  for  additional  fields  of  study,  and  the  improve¬ 
ment  of  the  eight  field  equations  for  which  small  interpretative  anomalies  had 
occurred  previously.  All  interpretative  anomalies  have  not  been  eliminated, 
but  they  have  been  reduced.  High-half  cadets  in  Physics  and  Computer  Science 
still  have  a  significantly  higher  mean  IPSS  for  Mathematics  than  for  their  own 
field  Cadets  in  Literature  still  have  a  higher  mean  on  Foreign  Languages 
IPSS  than  on  Literature  IPSS.  Campbell  (1977)  stated  that  for  some  unknown 
reason,  a  few  anomalous  occupation  criterion  groups  still  were  found  to  typi¬ 
cally  make  a  higher  score  on  another  occupation  scale  than  on  their  scale. 
Perhaps  further  study  of  these  field  IPSS  anomalies  can  throw  more  light  on 
this  situation. 

The  resulting  computer -produced  profile,  sample  shown  as  Figure  2,  was  dis¬ 
tributed  to  the  cadets  in  the  Class  of  1983  by  the  counselor  involved  in  their 
choice  of  a  field  of  study. 

General  Comments 


The  interest  similarity  profiles  have  been  reported,  by  both  cadets  and 
academic  counselors,  as  contributing  to  increased  satisfaction  and  success 
of  cadets  in  choice  of  area  of  concentration  and  field  of  study.  The  recent 
report  of  the  Evaluation  team  representing  the  Commission  on  Higher  Education 
of  the  Middle  States  Association  of  Colleges  and  Schools  stated  as  follows: 

"We  think  the  academic  advising  process  is  outstanding.  .  .  .  The  recent 
development  which  permits  the  Company  Academic  Counselor  to  use  information 
developed  through  the  Strong -Campbell  inventory  instrument  is  to  be  commended. 
This  development,  which  requires  far  less  training  than  the  use  of  the  inven¬ 
tory  itself,  enables  the  cadet  “o  know  how  his  or  her  interests  correlate  with 
the  interests  of  ■•■hose  who  have  been  successful  in  the  various  concentrations. 
This  is  a  very  helpful  advising  tool." 

Conclusion 


The  scales  developed  by  this  approach  provide  valid  and  useful  measures  of 
the  similarity  of  an  individual  to  a  criterion  group  on  important  group  dis¬ 
criminating  characteristics. 
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Reading  Ability,  Readability,  Motivation  and  Test  Validity 


According  to  a  prevalent  argument  about  occupational  proficiency 
testing,  multiple-choice  tests  have  built-in  sources  of  invalidity. 
One  such  source  pertains  to  the  reading  competencies  required  in  taking 
a  multiple-choice  test.  The  test,  it  is  argued,  requires  reading 
competencies  beyond  those  required  by  the  job  itself.  Thus,  one  finds 
job  incumbents  whose  job  performance  is  outstanding  but  who  cannot 
score  well  on  a  paper-and-pencil  test.  There  is  also  a  rather  commonly 
accepted  argument  about  the  interactive  effects  of  reading  ability  and 
motivation  on  performance.  How  well  one  handles  written  material  la  a 
function  of  motivation  as  well  as  reading  ability.  The  present  paper 
will  analyze  the  correlation  of  test  performance  and  reading  ability  in 
relation  to  the  readability  of  the  test  materials  and  the  motivation  of 
examinees. 
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Reading  Ability,  Readability, 

Motivation  and  Test  Validity 

Clay  V.  Brittain  Mary  M.  Brittain 

Army  Training  Support  Center  Virginia  Commonwealth  University 

This  study  was  concerned  specifically  with  the  performance  of  soldiers  on 
Skill  Qualification  Tests  (SQT)  in  relation  to  the  readability  of  the  tests. 

More  generally,  the  paper  also  touches  upon  the  effects  of  soldiers'  motivation 
and  training  on  SQT  performance. 

First,  we  will  comment  briefly  about  the  makeup  and  scoring  of  SQT.  Skill 
Qualification  Tests  are  made  up  of  the  following  parts,  or  components:  the 
Job  Site  Component  (or  JSC),  the  Hands-fti  Component  (or  HOC)  ,  and  the  Skill 
Component  (or  SC). 

The  Job  Site  Component  (JSC)  is  not  a  test  in  the  conventional  sense. 
Rather,  supervisors  rate  soldiers  on  their  ability  to  perform  certain  tasks. 

The  Hands-On  Component  (HOC),  as  the  name  implies,  is  made  up  of  performance 
tests.  Soldiers  perform  certain  tasks  under  a  set  of  standard  conditions 
and  their  performance  is  scored  according  to  clearly  specified  criteria. 

The  Skill  Component  (SC)  ,  as  the  name  does  not  imply,  is  a  multiple-choice  test. 
The  SQT  score  is  an  aggregate  of  the  component  scores.  The  written  portion 
of  the  SQT  (the  Skill  Component)  is  the  major  source  of  variance  in  SQT  scores; 
and,  of  course,  this  is  the  component  where  the  problem  of  readability  arises. 

One  of  the  major  concerns  about  printed  multiple-choice  tests  of  occupa¬ 
tional  competence  is  that  the  test  may  require  reading  ability  in  excess  of 
that  required  by  the  job  itself.  If  an  SQT  requires  reading  skills  which  are 
excessive  in  relation  to  the  reading  abilities  of  soldiers  and  the  demands  of 
the  MOS ,  then  there  obviously  are  serious  implications  for  test  validity- 
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Soldiers  with  limited  reading  skills  may  not  be  able  to  achieve  test  scores 
which  are  commensurate  with  their  MOS  competence. 

Major  Questions  and  Methodology 

With  respect  to  readability,  the  study  was  designed  to  address  three 
questions  as  follows: 

(1)  Does  the  readability  of  SQT  match  the  readability  in  the  target 
soldier  population? 

(2)  To  what  extent  is  reading  ability  a  factor  in  the  SQT  performance 
of  soldiers? 

(3)  To  what  extent  is  the  association  between  reading  ability  and  SQT 
performance  modified  by  other  factors,  specifically  soldier  motivation? 

These  questions  were  addressed  here  through  analysis  of  ten  SQT  and 
data  relating  thereto.  The  SQT  were  all  skill  level  one  tests.  Skill  level 
one  is  the  lowest  of  the  five  skill  levels  in  the  Army  MOS  structure.  The 
SQT  were  drawn  from  the  ten  MOS  shown  in  Table  1.  The  first  five  MOS  listed 
here  are  Combat  Arms  MOS.  The  latter  five  are  Support  MOS  (i.e..  Combat 
Support /Combat  Service  Support).  This  mix  of  MOS  was  by  design. 

(Table  1  About  Here) 

As  part  of  a  larger  study,  data  for  use  in  evaluating  these  ten  SQT  were 
collected  from  skill  level  one  soldiers  who  had  taken  the  SQT,  and  from  their 
supervisors.  The  SQT  also  were  reviewed  by  specialists  from  the  civilian  testing 
community.  The  paper  draws  upon  data  provided  by  the  field  research  and  the 
expert  reviews  of  the  SQT. 

Match  Between  SQT  Readability  and  Soldier  Reading  Ability 

The  readability  of  the  SQT,  expressed  as  a  reading  grade  level  (RGL) ,  was 
estimated  through  application  of  the  Flesch-Kincaid  reading  ease  formula  which 
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uses  sentence  length  and  word  length  to  estimate  the  readability  of  prose 
materials  (Kincaid  &  Fishbume,  1977). 

The  reading  ability  of  soldiers,  also  expressed  as  a  reading  grade  level 
(RGL) ,  was  estimated  from  AFQT  scores.  The  average  reading  ability  in  each  MOS 
population  was  estimated  using  a  conversion  scale  based  upon  correlations  of 
AFQT  scores  with  scores  on  the  Nelson-Denny  Reading  Test  and  the  Gates-MacGinitie 
Reading  Test.  This  conversion  scale  was  developed  by  the  Air  Force  Human 
Resources  Laboratory  (Mathews,  Valentine  &  Sellman,  1978).  Estimates  of  the 
RGL  of  each  SQT  and  the  RGL  of  the  soldier  population  for  which  the  SQT  was 
intended  are  shown  iri  Table  2.  These  estimates  indicate  consistent  matches 
between  SQT  readability  and  soldier  reading  ability.  The  average  RGL  of  the 
SQT  generally  is  below  the  average  RGL  of  the  soldier  population.  The  only 
exception  to  this  match  of  SQT/RGL  and  soldier/RGL  is  the  SQT  for  MOS  55B. 

(Table  2  About  Here) 

The  conclusion  indicated  here  is  that  readability  is  not  a  major  source 
of  invalidity  in  the  SQT.  The  conclusion  also  is  consistent  with  data  from 
the  interviews  of  noncommissioned  officers. 

When  asked  for  their  opinions  as  to  the  major  reasons  why  soldiers  fail 
SOT.  N’CO's  only  infrequently  mentioned  reading  ability. 

Reading  Ability  as  a  Factor  in  SQT  Performance 

It  should  not  be  concluded,  however,  that  reading  ability  is  unimportant 
in  SQT  performance.  A  questionnaire  which  was  completed  by  soldiers  included 
the  item  "I  have  difficulty  understanding  the  written  portion  of  my  SQT." 

Soldiers  responded  by  checking  one  of  the  following:  never,  seldom,  sometimes, 
usually,  or  always.  The  percentage  of  soldiers  in  each  MOS  who  responded  with 
"sometimes,"  "usually"  or  "always"  is  shown  in  Table  3.  There  is  a  substantial 
proportion  of  soldiers  who  reported  difficulty  in  understanding  the  skill  component 

(Table  3  About  Here) 


Soldier  responses  to  the  item  "I  have  difficulty  understanding  the  written 
portion  of  my  SQT"  were  quantified  and  correlated  with  SQT  scores.  The 
correlations,  while  not  all  statistically  significant,  were  all  negative.  Finally, 
we  computed  correlations  between  AFQT  scores  and  SQT  scores  and  regarded  these 
correlations  as  estimates  of  the  contribution  of  reading  ability  to  SQT  variance. 
The  correlations  are  shown  in  Table  4. 

(Table  4  Ab  out  Here) 

It  is  pertinent  to  note  here  that  on  each  SQT,  the  reviewers  found  items 
which  were  complex  and  unnecessarily  difficult  in  wording.  In  a  number  of 
instances,  these  items  were  answered  correctly  by  even  fewer  soldiers  than  would 
be  expected  or  the  basis  of  chance. 

The  conclusion,  stated  above,  that  readability  of  the  test  is  not  a  serious 
problem  must  then  be  qualified.  Readability  of  the  test  and  reading  ability  of 
soldiers  clearly  are  factors  in  SQT  performance. 

Effects  of  Soldier  Motivation 

We  now  wish  to  address  the  third  question:  to  what  extent  is  the  association 
between  reading  ability  and  SQT  performance  modified  by  other  factors,  especially 
soldier  motivation? 

(Figure  1  About  Here) 

The  question  stems  from  a  simple  model  of  reader  performance  (Klare,  1973), 
shown  in  Figure  1.  It  says,  in  effect,  the  reader's  performance  depends  upon 
the  interaction  of  three  factors:  the  reader's  level  of  competence,  the 
readability  of  the  materials,  and  the  level  of  reader's  motivation.  Thus,  a 
highly  motivated  soldier  might  effectively  deal  with  materials  at  a  level  of 
readability  that  the  less  well  motivated  soldier  of  comparable  reading  ability 
could  not  deal  with  effectively.  This  was  an  area  which  we  wished  to  explore. 
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As  a  means  of  assessing  motivation,  soldiers  responded  to  the  following  questions: 

How  important  is  to  you  to  get  a  good  score  on  your  SQT? 

How  important  do  you  feel  the  SQT  is  for  promotion? 

Do  you  intend  to  make  the  Army  a  career? 

Would  you  recommend  your  MOS  to  someone  considering  a  career  in  the  Army? 

It  was  expected  that  soldiers  who  expressed  their  intentions  to  make  a  career 
of  the  Army  and  those  who  would  recommend  their  MOS  to  someone  else  would  probably 
be  more  strongly  motivated  to  do  well  on  the  SQT.  It  was  also  assumed  that  those 
who  said  that  the  SQT  was  important  tried  harder  to  do  well.  But  when  SQT  scores 
were  analyzed  in  relation  to  soldiers'  answers  to  these  questions,  no  relationship  wa 
found.  The  absence  of  an  association  between  SQT  scores  and  soldier  motivation 
as  measured  here  was  unexpected.  It  may  be  that  our  questions  simply  did  not  reflect 
soldier  motivation.  But  another  explanation,  and  in  our  opinion,  a  more  plausible 
one,  has  to  do  with  the  experience  of  taking  a  test.  For  most  people  in  our  society, 
taking  a  test  is  a  stressful  situation  in  which  one  tries  to  do  their  best.  We 
suspect  that  this  also  is  true  of  soldiers  taking  an  SQT.  Thus,  differences  in 
motivation,  such  as  our  questions  were  designed  to  probe,  are  overwhelmed  and 
washed  out  in  the  test  situation  itself. 

This  suggests  that  differences  in  motivation  impact  on  test  scores,  if  at  all, 
through  prior  efforts  to  prepare  for  the  test.  But  among  skill  level  one  soldiers, 
training  for  the  SQT  seems  not  as  much  a  matter  of  individual  motivation  as  of 
unit  emphasis.  Two  indices  of  unit  training  are  the  timely  delivery  of  SQT 
Notices  to  soldiers  and  the  amount  of  training  for  the  SQT.  Both  of  these  indices 
are  positively  correlated  with  the  SQT  scores  as  shown  in  Table  5. 

(Table  5  About  Here) 

The  SQT  Notice  informs  soldiers  on  just  what  tasks  are  to  be  tested  on  the 
SOT.  The  correlations  between  training  time  and  SQT  scores  may  seem  surprisingly 


low.  But  an  important  element  here  is  the  selectivity  in  training  soldiers  for 
the  SQT.  The  training, at  least  in  many  instances,  is  based  upon  diagnostic 
information  about  soldier  performance.  Those  soldiers  who  are  identified  as 
nonperformers  of  a  task  tend  to  get  the  most  training. 

Summary  and  Conclusions 

In  summary,  the  major  results  of  this  analysis  are  as  follows: 

1.  The  readability  of  the  ten  SQT  included  in  this  study  generally  match 
the  reading  ability  in  the  target  soldier  population.  Thus,  readability  would 
not  seem  to  be  a  serious  source  of  invalidity  in  SQT. 

2.  However,  soldier  reading  ability  and  SQT  readability  are  important 
factors  in  SQT  performance.  Evidence  for  this  conclusion  comes  from  reviews 
of  SQT  by  outside  experts,  from  soldier  responses  indicating  difficulty  in 
understanding  the  skill  component  of  the  SQT,  and  correlations  of  AFQT  and 
SQT  scores. 

3.  Individual  soldier  motivation  as  expressed  in  military  career  intentions, 
attitudes  toward  the  MOS,  and  opinions  about  the  importance  of  the  SQT  was  not 
shown  to  be  related  to  SQT  scores. 

A.  Unit  emphasis  on  the  SQT  as  reflected  in  the  early  delivery  of  the  SQT 
Notice  and  training  of  soldiers  was  related  to  SQT  scores. 

5.  With  respect  to  the  question  of  major  interest  here,  the  findings 
are  negative.  This  question  concerned  the  association  between  soldier  reading 
ability  and  SQT  performance,  as  the  association  might  be  modified  by  other 
factors.  There  is  evidence  that  SQT  scores  are  associated  with  reading  ability. 
But  our  data  did  not  give  evidence  that  this  association  is  modified  by  soldier 
motivation  or  unit  emphasis  on  the  SQT. 
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Table  2 


MATCH  OF  SQT  READABILITY  WITH 
SOLDIER  READING  ABILITY 


MOS 

SQT  READING 

GRADE  LEVELS 

SOLDIER  READING 

GRADE  LEVELS 

I1H 

6 

9 

12C 

5 

8 

13B 

_  6 

8 

16? 

8 

8 

19E 

6 

8 

31M 

7 

8 

55B 

9 

8 

6?N 

8 

10 

71L 

8 

11 

76X 

9 

8 
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Table  3 


CORRELATION  OT  REPORTED  DIFFICULTY  WITH 
SOLDIER  SCORES  ON  THE  WRITTEN 
PORTION  OF  THE  SQT 


CORRELATION  WITH 

MOS  SC  SCORES* 


11H 

-.30 

12C 

-.34 

13B 

-.39 

16P 

o 

1 

19E 

-.09 

3IM 

-.02 

55B 

-.18 

67N 

-.19 

71L 

0.00 

Negative  correlations  indicate  that 
soae  difficulty  resulted  in  lower 
scores . 
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TABLE  4 


CORRELATIONS  OF  SOT  SCORES  WITH  AFQT  SCORES 


nos 

N 

SOT 

SCORES 

SC 

SCORES 

11H 

31 

.43* 

.41* 

12  C 

12 

.26 

.47 

13B 

27 

.42* 

.39* 

16P 

22 

.22 

.32 

19E 

55 

.22 

.37* 

31M 

17 

.48* 

.41* 

55B 

27 

.30 

.25 

57N 

46 

.22 

.21 

71L 

27 

.31 

.35* 

Statistically  Significant 


Table  5 


CORRELATION  OF  SQT  SCORES  WITH  NUMBER  OF  WEEKS 
SOLDIERS  HAD  SQT  NOTICE  AND  NUMBER 
OF  HOURS  SPENT  TRAINING 


‘ - - - - - — _ _  _ 

MOS 

SQT  NOTICE 

TRAINING  TIME 

11H 

.43* 

.40* 

12C 

.61* 

.21 

13B 

.24 

.28* 

16p 

\X> 

o 

• 

1 

.25* 

19E 

.28* 

.09 

31M 

.15 

.13 

55B 

.40* 

.22 

67N 

.15 

-.02 

71L 

.14 

.26 

76X 

Insufficient  Data 

.21 

♦Significant  at  .05  level 


THE  READABILITY 
LEVEL  OF 
MATERIAL 
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FORECASTING  ARMY  OFFICER  RETENTION  PRIOR  TO  COMMISSIONING1 


Richard  P.  Butler,  Ph.D.,  Chief,  Research  Branch 
Mr.  Claude  F.  Bridges,  Research  Psychologist 
Mr.  John  W.  Houston,  Statistician 

Office  of  the  Director  of  Institutional  Research 
United  States  Military  Academy 
West  Point,  New  York  10996 


Valid  selection  instruments  may  help  the  Army  to  diminish  the  problem  of 
retaining  qualified  officer  personnel  after  their  initial  tours  of  duty.  This 
report  describes,  validates,  and  updates  information  on  the  Military  Career 
Commitment  Gradient  (MCCOG),  an  instrument  designed  to  predict,  prior  to  com¬ 
missioning,  whether  an  individual  will  remain  on  active  duty  after  his  initial 
tour.  The  MCCOG  was  administered  to  the  U.S.  Military  Academy  Classes  of 
1966,  1967,  and  1969  shortly  before  commissioning,  and  to  the  Classes  of  1970, 
1971,  and  1972  one,  two,  and  three  years,  respectively,  before  commissioning. 
The  MCCOG  scores  for  all  six  classes  were  significantly  correlated  with  the 
criterion--on  active  duty  versus  not  on  active  duty- -gathered  seven  to  nine 
years  later.  The  coefficients  ranged  from  .17  when  the  MCCOG  was  given  three 
years  prior  to  commissioning,  to  .54  when  the  MCCOG  was  administered  just 
prior  to  commissioning.  Possible  uses  of  the  MCCOG  for  any  commissioning 
program  are  discussed. 


1Any  conclusions  in  this  report  are  not  to  be  construed  as  official  U.S. 
Military  Academy  or  Department  of  the  Army  positions  unless  so  designated  by 
other  authorized  documents. 
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FORECASTING  ARMY  OFFICER  RETENTION  PRIOR  TO  COMMISSIONING 


A  professional  officer  corps  can  be  developed  and  maintained  only  by  re¬ 
taining  those  officers  who  are  qualified  and  motivated  to  lead.  Perhaps  no 
other  aspect  of  developing  a  professional  officer  corps  has  been  of  greater 
concern  than  has  the  problem  of  retaining  qualified  personnel  after  their 
initial  tour  of  service  (Edmonds,  1972).  One  method  of  overcoming  this  prob¬ 
lem  is  to  identify,  prior  to  commissioning,  those  individuals  who  are  not 
likely  to  remain  on  active  duty  after  their  initial  tour  of  service.  Prevent¬ 
ing  commissioning  or  career  counseling  and  programs  designed  to  increase  moti¬ 
vation  for  a  service  career  could  then  be  started,  if  thought  useful. 

Previous  research  to  identify,  prior  to  commissioning,  individuals  likely 
to  leave  active  duty  upon  completing  their  initial  tour  has  not  been  preva¬ 
lent.  The  only  well-controlled  studies  that  the  authors  are  aware  of  were 
completed  by  Shenk  (1972,  1973)  and  Butler  and  Bridges  (1978).  Shenk  investi¬ 
gated  the  officer  input  from  the  various  Air  Force  commissioning  programs 
concerning  the  predictability  of  their  future  status  (on  active  duty  versus 
not  on  active  duty) .  The  major  finding  was  that  the  correlation  between  a 
five-point  career  intent  question  and  status  five  years  later  was  .24,  a  coef¬ 
ficient  that  Shenk  considered  rather  low.  Butler  and  Bridges,  using  a  one- 
question  instrument  called  the  Military  Career  Commitment  Gradient  (MCCOG)  for 
two  classes  at  the  U.S.  Military  Academy,  found  that  it  had  significant  corre¬ 
lations  of  .54  and  .39  with  status  seven  years  later. 

The  purpose  of  this  report  is  to  describe,  revalidate,  and  update  informa¬ 
tion  on  the  MCCOG,  an  instrument  designed  to  predict,  even  before  commission¬ 
ing,  the  probability  of  an  individual  remaining  on  active  duty  beyond  his/her 
initial  tour. 

METHOD 

Instrument 

The  MCCOG  (Table  1)  is  a  one-question  instrument  designed  to  measure  pre¬ 
cisely  the  strength  of  an  individual's  commitment  to  a  military  career.  The 
rationale  upon  which  various  characteristics  of  the  MCCOG  are  based  is  orga¬ 
nized  around  the  major  problems  of  measurement  peculiar  to  rating  scales — such 
as  controlling  the  set,  providing  meaningful  and  practical  scaling  of  scores, 
and  insuring  that  respondents  rate  the  same  thing  and  perceive  and  use  the 
same  scale.  Of  special  import  is  the  length  of  the  response  scale.  Instead 
of  the  usual  two  to  five  defined  areas  as  alternatives,  the  scale  consists  of 
99  scaled  points  on  a  gradient  with  definitions  placed  at  19  locations  in 
terms  of  the  normal  probability  distribution  (like  T-score  percentiles)  ,  and 
with  13  verbal  descriptions  selected  on  the  basis  of  empirical  data  as  to 
relative  level  of  meaning  consistently  implied. 

With  such  a  large  number  of  continuous  scaled  response  points,  reliability 
is  quite  high.  The  reliability  coefficient  of  the  MCCOG  scores,  based  on 
test-retest  data,  freed  of  temporal  change  effects  (Heise,  1969),  was  .83  for 
a  sample  of  564  respondents.  Comparisons  of  validity  data  on  MCCOG,  on  the 
usual  five  alternatives  to  essentially  the  same  question,  and  on  a  semantic 
differential  type  scale  developed  by  the  ARI  Sample  Survey  Group,  suggested 
that  the  much  higher  validity  of  the  MCCOG  is  at  least  partially  due  to  its 
high  reliability. 
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Table  1. 


Military  Career  Commitment  Gradient 


This  item  concerns  the  intensity  of  your  desire  for  a  career  as  an 
officer  in  the  military  service.  It  consists  of  (1)  a  question 
and  (2)  a  response  gradient  extending  continuously  between  two 
defined  extreme  values. 

Selected  areas  on  the  gradient  are  described,  both  verbally  and  in 
terms  of  probabilities,  to  provide  you  with  some  meaningful 
reference  points  and  to  provide  for  more  precision  in  scalar 
interpretation. 

At  selected  scalar  points,  percentages  beside  the  gradient  indicate 
the  judged  probability  (number  of  judged  chances  in  100)  of  one 
voluntarily  continuing  his  active  military  career  until  mandatory 
retirement.  Note,  however,  you  definitely  should  NOT  limit  your¬ 
self  to  the  few  points  for  which  descriptions  are  provided. 

Because  of  the  procedures  for  analyzing  this  item,  it  is  very 
important  that  you  follow  these  instructions  precisely,  step  by  step. 

INSTRUCTIONS .  Complete  each  step  before  going  to  the  next  one. 

Step  One.  Thoughtfully  read  the  question  in  the  box  below: 


QUESTION: 

To  what  degree  are  you  now  certain  that  you  will 
continue  an  active  military  career  until  mandatory 
retirement?  


Step  Two.  At  the  bottom  of  the  gradient,  on  the  opposite  page, 
read  the  defintion  of  that  extreme  point  on  the  gradient. 

Step  Three.  At  the  top  of  the  gradient,  read  the  definition 
of  that  extreme point. 

Step  Four.  At  the  middle  of  the  gradient,  the  50%  probability 
point ,  read  the  description  of  that  point. 

Step  Five.  Locate  the  general  area  on  the  gradient  which  seems 
to  correspond  best  with  your  current  commitment;  thoughtfully  read 
the  descriptions  of  the  near  points  and  check  the  space  on  the 
gradient  that  most  closely  represents  your  current  level  of 
commitment.  Do  NOT  limit  yourself  to  the  few  points  described  verbally. 

Step  Six.  Select  the  coded  letter  and  number  combination  at 
the  left  of  the  checked  space  on  the  gradient.  Enter  this  as  illus¬ 
trated  '  below.  For  example,  if  you  had  checked  the  space  coded  "fl," 
you  would  mark  the  answer  sheet  as  follows: 


I 


I 
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(Continuing  Table  1) 


POPE 

JS 

J< 

J3 

J2 

Jl 

13 

14 
13 
12 
il 
hS 
h4 
h3 
h2 
hi 

|4 

f3 

*2 

el 

is 

f4 

13 

13 

« 

e5 

e4 

e3 

e2 

•1 

d3 

d4 

d3 

d2 

dl 

c5 

c4 

c3 

c2 

cl 

bS 

b4 

b3 

b2 

bl 

aS 

a4 

a3 

*2 

al 


MILITARY  CAREER  COS*(!TJ«NT  GRADIENT 
-'•'+«  "-There  Is  Infinite  probability  that  I  *111  continue  my  active 

military  career  as  long  as  I  possibly  can,  a  career  as  an  officer 
in  active  military  service  la  sore  important  to  me  than  la  any¬ 
thing  else  in  the  world.  There  la  absolutely  no  chance  at  *11 
th^t  anything  In  the  world  could  ever  develop  that  could  cause 
me  to  voluntarily  resign. 

-99.993% 


-99. 9%-l  am  virtually  certain  that  1  will  continue  »y  active  ailitary 
car-er  as  long  as  I  am  allowed  to  do  so— that  I  will  HOT 
voluntarily  resign. 

-99%—— I  am  almost  certain  I  will  make  a  continuing  military  career  if 
passible . 


-95% 

-90%- — I  am  confident  that  I  will  make  a  continuing  military  career  and 
NOT  voluntarily  resign. 


-73% — I  air  very  likely  to  continue  my  military  career  as  long  as  possible 

"«5% — I  probablv  will  remain  In  the  military  service  after  completion  of 
my  military,  obligation  as  an  off  tear. 

-30%-— i  am  not  Inclined  the  least  bit  either  wey  et  present, 

*33% — I  am  not  sure  but  probably  will  resign  after  completing  ay  military 
obligation  as  an  jTt TVET'. '  ' 

-25% — 1  am  very  likely  to  resign  when  I  can  honorably  do  so  after  com¬ 
pleting  my  military  obligation  aa  an  offlcsr. 

-15% 

-10%  —  At  this  time,  I  am  conf Idem  I  will  resign  ay  commission  after 
completing  my  military  obligation. 

-5% 


hl%— Aa  of  now,  I  am  almost  certain  that  I  will  get  out  of  the  military 
service  ns  soon  as  I  possibly  can. 


_0.  1%_i  am  virtually  certain  that  £  will  resign  when  T  can. 


-0.005% 

In  my  personal  feelings,  attitudes  and  thoughts,  I  aa  utterly 
co.-mit  ted  to  a  completely  non-military  occupat ional -career  and 
life  as  soon  as  it  is  at  all  possible.  There  it  absolutely  no 
possibility  whatsoever  that  I  will  continue  as  an  officer  in 
®"  the  military  service  beyond  my  minimal  obligated  military  duty. 
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The  MCCOG  was  developed  to  reflect  the  result  of  the  interaction  of  both 
intent  to  pursue  and  intensity  of  desire  for  a  military  career.  Focusing  the 
instrument  on  expressed  intentions  is  supported  by  a  literature  review  con¬ 
cerning  employee  turnover  by  Porter  6  Steers  (1973) .  They  found  that  ex¬ 
pressed  intentions  concerning  future  participation  in  an  organization  appeared 
to  be  a  better  predictor  than  measures  of  j^b  satisfaction.  For  example. 

Porter  §  Steers  cited  Kraut  (1970)  who,  in  a  study  of  managerial  personnel, 
found  significant  relationships  between  expressed  intent  to  stay  and  subse¬ 
quent  employee  participation.  The  correlations  were  far  stronger  than  those 
found  between  expressed  satisfaction  and  continued  participation.  Furthermore, 
Atchison  §  Lefferts  (1972),  in  a  study  of  turnover  among  Air  Force  pilots, 
found  that  turnover  was  significantly  related  to  the  frequency  with  which 
pilots  thought  about  leaving  their  jobs. 

Subjects  and  Procedure 

The  MCCOG  was  administered  shortly  before  commissioning,  which  takes  place 
at  graduation,  to  cadets  in  the  U.S.  Military  Academy  Classes  of  1966,  1967 
and  1969,  and  in  August  1969  to  the  Classes  of  1970-71-72.  The  advantage  of 
the  August  1969  administration  was  that  it  allowed  the  predictive  power  of  the 
MCCOG  to  be  assessed  for  Classes  which  were  one,  two,  and  three  years  from 
graduating  and  commissioning.  The  results  of  the  Classes  of  1966  and  1967 
administrations  were  reported  earlier  by  Butler  and  Bridges  (1978),  but  will 
be  included  in  the  current  report  to  give  a  more  complete  picture. 

Usable  returns  that  could  be  paired  with  the  criterion  of  status  were 
received  from  396  graduates  from  the  Class  of  1966,  465  (Class  of  19i>', ) ,  550 
(Class  of  1969),  539  (Class  of  1970),  611  (Class  of  1971),  and  708  (Class  of 
1972) .  The  samples  were  less  than  the  number  of  graduates  in  each  class  be¬ 
cause  only  those  graduates  with  MCCOG  scores  who  were  commissioned  in  the  Army 
and  were  still  on  active  duty,  o~  who  were  commissioned  in  the  Army  and  had 
voluntarily  (unqualified)  resigned  from  active  duty  were  included  in  the 
analysis.  Those  graduates  who  were  allied  cadets,  commissioned  in  other 
services,  involuntarily  separated,  retired  for  disability,  or  otherwise  sepa¬ 
rated  (including  deaths),  were  not  included  in  the  samples.  Graduates  from 
the  Classes  of  1966  and  1967  incurred  mandatory  four-year  military  obligations 
(the  other  classes  had  five-year  obligations),  after  which  they  were  free  to 
resign  their  commissions,  unless  they  had  incurred  additional  obligations 
because  of  additional  education. 

The  status  criterion  for  the  Classes  of  1966  and  1967  was  obtained  in 
1973  and  1974,  respectively,  seven  years  after  the  MCCOG  was  given.  The 
criterion  for  the  remaining  classes  was  gathered  in  1978,  nine  years  after  the 
MCCOG  was  administered.  Thus,  the  time  after  completion  of  their  mandatory 
obligation  ranged  from  one  year  for  the  Class  of  1972  to  four  years  for  the 
Class  of  1969. 


Data  Analysis 

Point -biserial  correlations  and  expectancy  tables  were  used  to  express  the 
relationships  between  MCCOG  sc >res  and  status  (scored  0/1).  To  correct  the 
correlations  to  a  common  base  th  a  possible  maximum  correlation  of  1.00,  the 
obtained  values  were  divided  by  .he  maximum  r  ,  possible  for  the  proportions 
in  the  two  groups.  ” 


RESULTS 


Table  2  shows  that  the  relationship  between  MCCOG  scores  and  status  for 
all  class  years  is  statistically  significant.  As  one  might  expect,  the  cor¬ 
relations  were  higher  for  Classes  (1966-67-69)  administered  the  MCCOG  just 
prior  to  graduation/commissioning. 


Table  2:  Relationship  Between  MCCOG  Scores  and  Status 


Class 

MCCOG 

Corrected 

Year 

Status 

N 

X 

SD 

rpb 

1966 

Active 

257 

60 

11 

.54*** 

Resigned 

139 

45 

12 

1967 

Active 

271 

59 

11 

.39*** 

Resigned 

194 

51 

13 

1969 

Active 

279 

56 

12 

.35*** 

Resigned 

271 

49 

13 

1970 

Act ive 

323 

50 

16 

.26** 

Resigned 

216 

43 

17 

1971 

Active 

321 

54 

16 

.23** 

Resigned 

290 

48 

18 

1972 

Active 

409 

56 

16 

.17* 

Resigned 

299 

52 

15 

*£  <  .10;  **£  <  .02;  ***£  <  .001 


The  expectancy  tables  shown  in  Tables  3,  4,  and  5  give  clear  illustrations 
of  the  relationships  between  MCCOG  scores  and  status.  They  indicate  that  in¬ 
dividuals  in  the  Classes  of  1966  and  1969  who  scored  61  or  above  on  the  MCCOG 


Table  3.  Relationship  Between  MCCOG  Scores  and  Status 


(Class  of  1966) 

MCCOG 

Score 

Percent  Remaining  on  Active  Duty 

Number 
Scoring 
in  this 
Range 

0  10  20  30  40  50  60  70  80  90  100 

81  or  over 

71-80 

61-70 

51-60 

41-50 

1-40 

13 

30 

84 

159 

93 

17 

i 
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were  very  likely  to  be  still  on  active  duty,  and  those  scoring  SO  or  below 
were  likely  to  have  resigned  from  active  duty.  For  the  Class  of  1972  the  re¬ 
lationship  is  not  as  sharp,  but  there  is  a  definite  tendency  for  individuals 
to  remain  on  active  duty  as  MCCOG  scores  increase.  For  brevity,  the  data  from 
the  Classes  of  1967,  1970,  and  1971  are  not  presented,  but  the  results  were 
similar  to  the  classes  that  were  presented. 


DISCUSSION 

The  prime  purpose  of  this  study  was  to  determine  the  validity  of  the  MCCOG, 
when  given  prior  to  commissioning,  as  a  predictor  of  retention  on  active  duty 
beyond  the  obligatory  first  tour  of  duty.  The  results  indicated  that  the 
MCCOG  is  a  valid  predictor  of  retention,  even  when  administered  several  years 
prior  to  commissioning  and  even  when  the  criterion  is  gathered  nine  years 
later. 

The  findings  support  the  theory  that  the  individual’s  direct  estimate  of 
his  future  tenure  is  a  good  predictor  of  turnover,  especially  when  determined 
with  high  precision.  Kraut  (1975)  states  that  the  individual  himself  is  the 
best  vehicle  for  properly  weighting  and  integrating  the  factors  that  go  into 
his  decision  to  quit  or  remain  in  a  job.  He  believes  that  the  individual  can 
provide  the  best  synthesis  of  attitude  toward  his  work  situation,  his  oppor¬ 
tunities  elsewhere,  and  other  aspects  of  his  life  that  bear  on  his  decision  to 
remain  in  a  job.  The  MCCOG  follows  Kraut’s  logic  quite  well.  It  is  a  direct 
measure  of  intentions  and  calls  for  the  respondent  to  integrate  all  the  vari¬ 
ous  aspects  that  influence  his/her  intent  to  be  a  career  Army  officer. 

The  findings  imply  that  the  MCCOG  might  be  a  useful  instrument  for  several 
purposes  for  any  source  of  commissioning.  First,  in  research  for  which  the 
ultimate  criterion  is  whether  or  not  an  individual  will  remain  on  active  duty, 
the  MCCOG  can  function  as  an  intermediate  criterion  that  is  available  prior  to 
commissioning.  Second,  as  a  measure  of  commitment,  it  might  be  a  valuable 
tool  in  evaluating  programs  designed  to  increase  motivation  for  a  service 
career.  For  example,  the  MCCOG  might  be  administered  before  and  after  a  sum¬ 
mer  training  program.  Third,  in  manpower  planning,  the  MCCOG  scores  of  a 
group  give  a  fairly  accurate  appraisal  of  the  number  of  individuals  that  are 
likely  to  remain  on  active  duty.  Fourth,  the  MCCOG  might  be  an  aid  in  the 
career  counseling  process  of  individuals.  A  fifth  potential  utilization  is  in 
the  admissions  area,  but  only  if  the  MCCOG  can  be  corrected  for  the  expected 
high  response  bias,  could  it  be  administered  to  applicants.  However,  it  could 
serve  as  the  criterion  to  be  predicted  by  another  instrument  that  could  help 
to  differentiate  those  applicants  who  are  likely  to  develop  high  commitment  to 
a  military  career  as  opposed  to  those  who  are  not. 
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1 .  SUMMARY . 

This  report  presents  the  results  of  the  training  effectiveness  analysis 
conducted  on  a  training  program  designed  to  transition  train  an  armored 
cavalry  from  the  M113A1  to  the  CFV.  Soldiers  in  the  19D  (armored  cavalry 
scout  crewman)  were  trained  in  the  operation  of  the  CFV  and  its  major  systems 
in  preparation  for  participation  in  an  FDTE  at  Ft  Knox.  Data  analyzed 
included  results  of  written  and  hands-on  tests,  results  from  training  on  the 
prototype  Conduct  of  Fire  Trainer  (COFT)  and  results  from  live  fire  exercises 
involving  the  7.62mn  coaxially  mounted  machine  gun  and  the  25mm  automatic 
cannon.  Relationships  between  these  data  and  soldier  demographics/aptitude/ 
attitude  were  investigated.  Problem  areas  were  observed  in  the  soldiers* 
proficiency  at  acquiring,  engaging  and  hitting  targets  with  the  CFV's  weapons 
systems.  Implications  into  CFV  crewman  soldier  selection  criteria  were  gained 
from  this  analysis. 

/  . 

2.  BACKGROUND. 

a.  The  CFV  is  designed  to  replace  the  M113A1  as  the  armored  cavalry's 
fighting  vehicle.  It  is  basically  the  same  as  the  Infantry  Fighting  Vehicle 
( IFV)  with  the  rear  compartment  configured  to  accommodate  an  increased  basic 
load  of  TOW  missiles  and  a  decreased  number  of  passengers.  (The  IFV  will 
carry  seven  men  in  the  rear  compartment;  the  CFV  two).  The  CFV  is  armed  with 
a  25mm  automatic  cannon,  a  coaxially  mounted  7.62m  machine  gun  and  a  two-pod 
TOW  launcher. 

b.  The  IFV  underwent  an  Operational  Test  (OT)  II  from  15  October  1979  to 
26  November  1979.  Cavalry  operations  were  not  examined  in  enough  detail 
during  this  test  and  a  CFV  FDTE  was  scheduled  at  Ft  Knox,  KY  from  2  June  1980 
through  8  August  1980  to  address  these  items.  A  cavalry  platoon  stationed  at 
Ft  Knox  was  selected  to  participate  in  the  FDTE.  This  platoon  was  judged 
ARTEP-qualified  on  the  M113A1  prior  to  beginning  the  training  program  designed 
to  transition  it  from  the  M113A1  to  the  CFV.  Five  crews  of  five  men  each  were 
organized  from  the  platoon's  19D  MOS  (cavalry  scout  crewman)  personnel  to  be 
trained  as  CFV  crewmen.  The  five  crew  positions  are:  track  commander  (TC), 
gunner,  senior  scout  (SS),  junior  scout  (JS)  and  driver.  These  twenty-five 
crewmen  are  referred  to  in  this  report  as  "players'*. 

c.  The  training  period  was  divided  into  two  phases,  individual  and 
collective  training,  with  each  phase  lasting  approximately  three  weeks. 

During  individual  training,  all  crannen  received  the  same  instruction 
conducted  primarily  in  a  classroom  or  motor  pool.  Training  during  the 
collective  phase  was  more  specific  to  duty  position  and  was  conducted  on 
tactical  ranges. 
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3.  SCOPE 


a.  The  individual  training  was  primarily  a  combination  of  lecture, 
demonstration  and  practice.  Training  devices  and  aids  used  were  the  CFV, 
the  Conduct  of  Fire  Trainer  (COFT),  printed  handouts  and  draft  manuals.  At 
the  conclusion  of  a  training  session  or  block  of  instruction,  a  hands-on  test 
was  given.  The  tests  we^e  administered  by  instructor  personnel  and  were 
scored  on  a  Go/No-Go  basis.  Personnel  who  failed  to  achieve  a  GO  on  a  test 
were  required  to  prepare  thanselves  on  their  own  for  a  retest.  A  second 
failure  resulted  in  instructor  assistance  to  prepare  for  another  retest.  At 
the  end  of  the  individual  phase  of  training,  the  more  important  of  these 
hancs-on  tests  were  given  again  in  a  two-day  comprehensive  examination. 

b.  Training  on  the  COFT  was  conducted  during  the  individual  phase  with 
all  twenty-five  players  participating.  Each  player  was  presented  a  series  of 
targets  on  the  COFT  and  the  results  of  his  engagements  were  recorded.  Mot  all 
players  received  the  sane  amount  of  COFT  training  due  to  player  absences  and 
time  constraints. 

c-  The  collective  training  phase  was  conducted  on  tactical  ranges  at  Ft 
Knox.  During  this  phase  of  instruction  the  players  participated  in  a  series 
of  exercises  which  required  the  performance  of  several  tasks  to  meet  the 
objective(s)  of  the  exercise.  For  example,  to  successfully  complete  the  range 
determination  exercise,  a  crew  was  required  to  acquire  a  target  within  a  given 
time  standard  and  the  gunner  was  then  required  to  determine  the  range  to  the 
target  within  another  set  of  standards. 

d.  These  exercises  were  used  as  the  hands-on  tests  during  the 
collective  phase.  However,  in  most  cases  personnel  whe  failed  to  achieve  a 
"Go"  on  a  given  exercise  were  not  re-tested.  This  was  due  to  time  constraints 
beyond  the  control  of  the  instructors.  During  this  phase,  the  crews  also 
participated  in  live  firings  of  the  7.62mm  COAX,  the  25am  main  gun  and  the  TOW 
missile. 

4.  INDIVIDUAL  TRAINING  PHASE. 

a.  At  the  end  of  a  task  training  session,  the  students  were  tested 
individually  on  the  task(s)  covered.  The  tests,  referred  to  as  “Class  HOT", 
were  hands-on  and  were  scored  as  “Go*  or  “No  Go".  At  the  end  of  the 
individual  phase,  the  majority  of  these  tasks  were  re-tested.  This  group  of 
tests  are  referred  to  as  "comprehensive  HOT"  or  "Comp  HOT". 

b.  Data  from  these  tests  show  that  there  were  498  Go's  or  507  first 
tries  of  the  class  hands-on  testt'  which  gives  a  first-time-Go  rate  of  98%. 

For  the  comp  HOT,  there  were  349  Go's  on  364  first  tries  for  a  first-time-Go 
rate  of  96%.  These  rates  indicate  that  thi.  training  provided  the  necessary 
skills  to  the  players  for  successful  performance  of  these  tasks  at  the  time 
they  were  tested.  Problems  were  observed  during  subsequent  training 
(especially  the  CCAy  and  25ram  live  fire  exercises)  involving  the  following 
tasks  taught  during  this  phase: 
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Loading  the  COAX  and  25mm  weapons. 

Performing  immediate  action  and  misfire  procedures  on  the  COAX  and 
25mm  weapons. 

c.  In  loading  the  25mm  gun,  the  players  experienced  difficulty  in 
properly  connecting  the  ammo  feed  chutes.  Improper  connection  of  these  chutes 
caused  many  stoppages  which  had  to  be  corrected  by  the  instructors  or,  in  some 
instances,  contractor  representatives.  The  players  were  also  unable  to 
consistently  perform  the  misfire  and  immediate  action  procedures  in  the  proper 
sequence.  More  training  and/or  practice  at  the  performance  of  these  tasks  are 
needed  so  that  the  skills  can  be  retained  by  the  soldiers.  The  connection  of 
the  ammo  feed  chutes  should  be  especially  addressed  because  of  the 
manipulation  skills  required. 

d.  During  the  individual  training  phase  the  students  were  tested  and 
determined  to  be  representative  of  the  overall  19D  MOS  Army  population  in 
terms  of  aptitude  and  demographics. 

5.  COLLECTIVE  TRAINING  PHASE. 


a.  During  the  collective  phase  of  training,  tasks  were  taught  which 
involved  part  or  all  of  a  crew.  Related  tasks  were  tested  in  an  exercise 
given  when  the  players  had  completed  training  on  the  tasks.  The  results  of 
these  exercises  show  that  there  were  478  Go's  on  629  tries  for  a  Go  rate  of 
76%.  The  areas  with  the  smallest  Go  rates  were  target  acquisition  [59%), 
range  determination  by  gunner  (35*) ,  thermal  mode  operations  (61%)  and  the 
COAX  and  25mm  live  fire  exercises  (51%  combined). 

b.  Other  hands-on  tests  (HOT) 


A  target  acquisition  and  engagement  (TAE)  hands-on  test  was  developed  by 
personnel  from  DTD  and  TRASANA  to  assess  the  players'  proficiency  at  acquiring 
and  engaging  targets.  This  test  was  given  on  two  occasions;  at  the  end  of  the 
training  program  (TAE  I)  and  the  end  of  the  FDTE  (TAE  II)  .  The  test  was 
designed  to  simulate  a  CFV  in  an  overwatch  position  observing  a  sector  for 
enemy  targets.  Prior  to  beginning  the  exercise,  the  crew  was  allowed  to 
observe  the  sector  for  one  minute  to  familiarize  themselves  with  the  terrain. 
During  the  test,  in  which  the  CFV  was  always  stationary,  a  assigned  TC  for 
the  crew  acted  as  TC  while  each  crew  member  was  tested  as  unner.  Then  the 
assigned  gunner  acted  as  TC  while  the  TC  was  tested  as  < 
was  being  tested,  the  remaining  crew  members  were  prever,  1 
sector. 


ter.  As  each  pair 
from  observing  the 


c.  TAE  I. 

1-  Each  TC/gunner  pair  was  tested  for  one  iteration  in  which  four 
targets  wre  presented.  These  were  full-scale  targets  mounted  on  SAAB  devices 
which  were  raised  and  lowered  by  the  test,  administrators.  Two  test 
administrators  gathered  the  data  for  all  the  tested  pairs.  Both  were 
positioned  on  the  CFV  where  they  could  observe  the  soldiers  being  tested  but 
located  so  as  not  to  interfere  with  their  performance  during  the  test. 
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Specific  Items  of  interest  are: 

°  The  standard  for  target  acquisition  (5  sec  from  target  exposure)  was 
met  34$  of  the  time. 

°  The  target  was  correctly  identified  by  type  42$  of  the  time. 

°  The  mean  time  for  the  gunner  to  determine  the  range  to  the  target  was 
25.8  seconds. 

0  The  mean  time  required  by  the  gunner  to  determine  the  range  to  the 
target  after  he  acquired  it  (8.5  sec)  exceeded  the  time  standard  for 
acquisition  plus  ranging  (8  sec). 

°  The  gunners  were  able  to  correctly  determine  the  range  to  the  target 
(within  200  meters)  60$  of  the  time. 

°  The  target  was  engaged  within  20  seconds  (26  seconds  for  TOW 
engagements)  29$  of  the  time. 

0  The  proper  ammo  for  the  engagement  was  selected  38$  of  the  time. 
Analysis  of  the  data  by  individual  engagement  revealed  the  following: 

“  The  target  was  correctly  identified  and  the  proper  ammo  selected  27 
times  (29$). 

°  Tank  targets  were  correctly  Identified  23  times  (35$);  TOW  was 
selected  to  engage  these  targets  11  times  (48$). 

0  Of  the  92  total  engagements,  time  standards  for  all  required  tasks 
were  met  seven  times  (8$),  though  all  tasks  were  not  necessarily 
performed  correctly. 

#  Of  the  92  total  engagements,  there  were  only  two  In  which  each 
required  task  was  correctly  performed  and  performed  within  the  time 
standard. 

2.  The  results  of  this  test  Indicate  that  the  players  were  not  able  to 
meet  The  time  standards  set  for  the  tasks  Involved.  Also,  the  players 
incorrectly  identified  the  targets  58$  of  the  time.  Several  of  the  players 
commented  that  the  lack  of  a  gun  tube  on  the  target  silhouettes  made  them 
difficult  to  Identify.  While  this  may  be  true,  the  silhouettes  used  in  the 
test  (and  the  live  firings)  were  standard  threat  targets  for  Army  training. 
Another  observation  concerns  the  players'  selection  of  ammo.  In  those  cases 
where  the  target  was  correctly  identified  by  type,  the  proper  ammo  was  chosen 
29$  of  the  time.  More  specifically,  TOW  was  selected  against  correctly 
Identified  tank  targets  less  than  half  (48$)  of  the  time.  This  represents  a 
serious  shortcoming  of  the  training  program  in  preparing  the  players  to  engage 
and  defeat  the  enemy. 


d.  TAE  II. 

1.  The  players'  overall  performance  on  the  second  test  equalled  or 
exceeded  their  performance  on  the  first  test  in  all  areas  except  "time  to 
range  after  acquisition"  and  "TOW  selected  against  correctly  identified  tank 
targets".  The  increase  in  time  to  range  after  acquisition  was  thought  to  be 
due  in  part  to  the  lack  of  experience  at  this  task  on  the  part  of  the  scouts 
and  drivers.  Analysis  of  those  engagements  in  which  an  assigned  TC  or  gunner 
was  tested  at  the  gunner's  position  shows  that  the  mean  time  for  these  crewmen 
to  range  to  the  target  after  acquiring  It  was  9.3  seconds.  The  mean  time  for 
the  remaining  crewmen  was  14.2  seconds.  Corresponding  results  from  the  first 
test  are  7.0  and  9.2  seconds,  respectively.  Both  groups  showed  an  Increase  in 
time  to  perform  the  task  from  the  first  to  the  second  test.  Thus  the 
difference  between  the  times  was  not  caused  solely  by  the  crewmen  being  less 
faniliar  with  CFY  turret  operations.  A  possible  explanation  for  the  change  is 
that  the  players  did  not  practice  ranging  to  target  within  the  time  standard 
during  the  FDTE  frequently  enough  to  maintain  the  level  of  proficiency  they 
had  acquired  during  the  training  period. 

2.  The  players  also  showed  a  decrease  in  the  percentage  of  engagements 
for  wTrich  TOW  was  selected  against  correctly  Identified  tank  targets  from  the 
first  to  the  second  test.  There  was  also  an  overall  laxity  In  issuing  and 
following  fire  commands  during  the  second  test  that  was  not  as  prevalent 
during  the  first  test  which  could  have  been  due  to  lack  of  structured, 
supervised  practice  In  this  area. 

e.  Results  of  Written  Skills  and  knowledge  (SAK)  Tests. 

1.  The  written  tests  were  developed  by  TRASANA  from  available  training 
materTals.  These  tests  were  not  part  of  the  plan  of  instruction  for  the 
talnlng  but  were  given  by  TRASANA  In  an  attenpt  to  measure  the  players' 
knowledge  about  the  CFV  In  specific  areas. 

2.  The  players  were  given  three  skills  and  knowledge  (SAK)  tests  during 
the  training  and  FDTE,  ( 1 . e . ,  SAK  I,  SAK  II  and  SAK  III).  SAK  I  was  given  at 
the  end  of  the  individual  training  phase;  SAK  II  at  the  end  of  collective 
training  and  SAK  III  at  the  end  of  the  FDTE.  The  tests  are  similar  but  not 
Identical  since  some  questions  and/or  choice  of  responses  were  changed,  added 
or  deleted  as  the  players'  familiarity  with  the  CFV  increased. 

3.  It  must  be  noted  that  the  high  first-time  "GO"  rates  of  the  class  and 
comp~H0T  (98$  and  96$  respectively)  discussed  earlier  reflect  a  greater  level 
of  knowledge  of  CFV  tasks  than  do  the  average  scores  on  SAK  I  and  II.  The 
most  likely  cause  of  this  discrepancy  Is  that  the  two  types  of  tests  measure 
different  aspects  of  the  tasks  involved.  The  hands  on  tests  (HOT)  require  the 
manual  execution  of  skills  and  knowledge  in  the  tasks  Involved  while  the  SAK 
test  mental  recall  of  information  about  the  tasks.  Direct  correlations 
between  performance  on  the  HOT  and  SAK  tests  are  not  meaningful  since  the 
number  of  “NO  GO",  and  therefore  the  statistical  variance,  on  the  HOT  is  too 
small . 
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£.  Comparing  the  results  of  SAK  II  and  SAK  III  reveals  that  the  players' 
scores  remained  virtually  constant  in  the  areas  of  turret  operations  and 
weapons  (categories  1  and  2)  while  decreasing  considerably  in  weapons 
maintenance.  Thus,  the  players'  knowledge  of  the  items  tested  was  not 
improved,  and  actually  decayed  in  category  three  during  the  FDTE.  Significant 
positive  correlations  were  found  between  the  scores  on  the  two  SAK  tests  in 
categories  1  and  2  and  the  total  test,  which  indicates  that  the  decrease  in 
scores  was  not  based  solely  on  the  part  of  those  players  who  were  less 
familiar  with  the  turret  and  weapons  system  (i.e.,  drivers  and  junior  scouts). 

5.  The  results  of  SAK  I  and  SAK  II  were  also  compared  to  the  results  of 
the  Target  acquisition  hands-on  test  given  at  the  end  of  training  (TAE  I). 
Significant  correlations  were  found  to  exist  between  the  players'  scores  on 
categories  two  and  three  (weapons  and  weapons  maintenance)  of  SAK  II  and  their 
performance  at  ranging  to  the  targets  during  the  hands-on  test  (higher  tests 
scores  were  associated  with  less  range  time).  No  relationships  were  found 
between  the  results  of  SSK  I  and  the  hands-on  test. 

f.  7.62mm  COAX  firings  (crew  subcaliber  exercise). 

.1.  Each  of  the  fifteen  players  was  scheduled  to  serve  as  gunner  during 
three  iterations  of  a  day  course  (three  targets)  and  a  night  course  (five 
targets).  Due  to  weapon/target  malfunctions  and  other  complications,  some 
players  traversed  the  course  less  than  three  times  while  others  traversed  than 
more  often.  The  COAX  was  used  as  a  subcaliber  training  device  during  these 
firings  and  a  crew  of  three  was  on  board  the  CFV  during  the  engagements.  The 
targets  used  were  half-scale  threat  vehicles  (except  for  one  full  scale  troop 
target  In  the  night  course)  and  were  set  at  actual  distances  within  1000 
meters  of  the  CFV.  The  CFV  was  moving  during  all  engagements  of  the  day 
course  and  stationary  during  the  night  course.  Results  are  shown  in  the 
following,  table  1. 

TABLE  1 

COAX  ENGAGEMENTS  WITHIN 
TIME  TO  KILL  STANDARDS 


Conditions 


Exposures 


mes 

Met 


ercen 
of  Exposures 


1 

|Day,  single  target 

1 

59 

27 

46 

1 

iDay,  multiple  targets 

1 

29 

8 

28 

r  1 

.  (Night,  single  target 

0 

|  (Night,  multiple  targets 

76 

7 

9 

2_.  The  table  shows  that  the  time  to  kill  standards  could  not  have  been 
met  in  more  than  46%  of  the  single  target,  day  exposures;  28%  of  the  multiple 
target,  day  exposures;  and  9%  of  the  multiple  target,  night  exposures.  Thus, 
the  players  did  not  consistently  meet  the  standards  for  the  f' rings. 
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g.  25mm  firings  (crew  combat  exercise). 

Data  from  this  exercise  concerning  accuracy  of  the  25mm  weapon  system  are 
classified.  The  numbers  of  targets  engaged  and  engagement  times  are  not 
classified  and  are  discussed  in  the  following  paragraphs. 

2-  The  same  fifteen  players  who  served  as  gunners  in  the  COAX  sUbcallber 
firings  also  participated  in  the  25mm  live  fire  exercise.  Targets  were  full 
scale  threat  vehicle  silhouettes  at  ranges  of  1000  to  1500  meters  and  were 
mounted  on  SAAB  devices  so  that  they  could  be  raised  and  lowered  by  remote 
control.  The  firings  consisted  of  three  iterations  for  each  of  the  fifteen 
players  of  a  day  and  a  night  course.  Both  courses  were  similar  to  those  used 
during  the  COAX  exercise  except  that  the  targets  were  full  size  at  longer 
ranges. 

2_.  The  percentage  of  targets  engaged  during  the  25mm  firings  shows  a 
substantial  increase  over  that  from  the  COAX  exercise.  Several  factors 
possibly  contributed  to  this  increase  in  target  servicing  proficiency.  Among 
these  are: 

"  The  practice  received  during  the  COAX  firings  provided  experience 
which  made  the  players  more  proficient  at  the  tasks  required  to 
acquire  and  engage  the  targets. 

0  The  targets  were  full  size  instead  of  the  closer  but  half-scale 
targets  used  during  the  COAX  exercise. 

°  The  players  were  more  interested  in,  and  excited  about,  firing  the 
25mm  gun  than  the  COAX. 

h.  Discussion  of  C0FT  training. 

1.  During  the  individual  training  phase,  all  of  the  players  received 
trainTng  in  gunnery  procedures  using  the  prototype  C0FT*.  Due  to  the  late 
inclusion  of  the  C0FT  into  the  program,  this  training  was  given  primarily  as  a 
"filler"  for  time  in  which  a  player  was  not  otherwise  occupied.  Therefore, 
not  all  players  received  the  same  amount  of  exposure  to  the  C0FT. 

2.  C0FT  training  consisted  of  a  series  of  target  presentations  in  which 
the  pTayer  was  required  to  determine  range  and  engage  the  target.  Range 
determination  procedures  on  the  C0FT  differ  from  those  on  the  CFV  in  the 
following  areas: 

°  During  C0FT  engagements,  the  player  had  only  to  announce  the  correct 
range  and  the  instructor  would  press  a  switch  which  caused  the  C0FT  to 
adjust  the  impact  of  the  rounds  properly.  On  the  CFV,  the  gunner  must 
move  his  right  hand  to  the  range  control  knob  and  dial  in  the  proper 
range.  This  motion  could  affect  his  ability  to  keep  the  target  in  the 
sight,  especially  at  long  ranges. 

"  Only  three  different  ranges  are  available  for  COFT  engagements  (1200, 
1500,  3000m).  Unlike  actual  firing  conditions,  range  to  the  target 
could  easily  be  identified  by  the  size  of  the  target  in  the  sight. 

The  players  quickly  learned  the  relationship  of  target  size  to  range 
and  therefore  were  less  careful  to  perform  proper  ranging  techniques 
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before  announcing  the  range.  This  became  evident  during  range 
determination  exercises  early  in  the  collective  phase,  when  the 
players  tended  to  Identify  target  ranges  at  1200,  1500  or  3000  meters. 

Comparison  of  COFT  results  to  live  fire  results. 

°  The  COFT  training  was  conducted  differently  from  the  live  firings  in 
that  the  players  were  not  restricted  to  time  limits  on  the  COFT.  They 
were  allowed  to  engage  the  target  until  it  was  hit  (or  moved  off  the 
screen  for  a  moving  target).  Therefore,  comparisons  of  mean  times  to 
engage  and  hit  are  not  meaningful.  The  percentage  of  targets  hit  on 
the  COFT  In  both  the  stationary  firer- stationary  target  and  moving 
flrer-statlonary  target  modes  is  much  greater  than  that  achieved  during 
the  COAX  and  25mm  exercises.  Tests  of  relationships  between  COFT  and 
25mm  data  In  both  percent  of  hits  and  mean  difference  between  hit  time 
and  engagement  time  for  the  fifteen  players  who  participated  in  the 
25mm  exercise  found  no  significant  correlations. 

°  The  players  engaged  and  hit  a  much  greater  percentage  of  targets  during 
the  COFT  training  than  during  the  COAX  and  25mm  live  firings.  This  is 
partially  due  to  the  absence  of  need  to  acquire  targets  on  the  COFT. 
(Targets  are  readily  visible  on  the  COFT  display).  Another  factor 
Influencing  these  differences  is  that  the  players  were  not  limited 
during  the  COFT  training  by  the  time  constraints  enforced  during  the 
1 ive  fire  exercises. 

6.  ASYAB  AND  ATTITUDE  ANALYSIS  RESULTS. 

a.  Four  ASVAB  areas  (OF,  MM,  SM  and  EL)  and  the  Sel ectABLE  scores 
correlated  positively  with  scores  on  all  three  written  tests.  The  MM  scores 
were  positively  correlated  with  engagement  times  on  the  target  acquisition 
hands-on  test  (TAE  I).  The  higher  MM  scores  also  correlate  with  less  time  to 
engage  after  acquisition. 

b.  The  players'  responses  to  the  attitude  surveys  were  predominately 
positive.  Attitudes  toward  the  training  materials,  CFV  nlght/buttoned-up 
operations  and  comfort  and  safety  of  the  vehicle  were  less  positive  than  those 
in  the  other  areas  surveyed. 

c.  In  general,  more  positive  attitudes  toward  the  training  were 
associated  with  better  performance,  as  measured  by  the  written  and  hands-on 
tests. 

7.  SUMMARY  FINDINGS 

a.  The  players  participating  in  the  training  program  were  representative 
of  the  overall  Army  population  of  19D  MOS  soldiers  in  terms  of  aptitude  and 
demographics. 


*Thie  ie  the  eame  COFT  used  during  training  for  the  IFV  OT  II,  but  ie  not 
related  to  the  Fighting  Vehicle  System  (FVS)  UCOFT  being  designed  for  use  in 
future  IFV/CFV  training  prograne. 
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b.  The  players1  performance  on  hands-on  tests  given  by  the  instructors 
during  the  individual  training  phase  indicates  that  the  training  program 
provided  the  skills  necessary  for  successful  accomplishment  of  the  involved 
tasks  within  established  time  standards.  Problem  areas  involving  these  tasks 
which  arose  during  the  collective  training  phase  were: 

“Loading  the  COAX  and  25mm  weapons 

“Performing  immediate  action  and  misfire  procedures  on  the 
COAX  and  25mm  weapons. 

c.  In  loading  the  25mm  gun,  the  players  experienced  difficulty  in 
properly  correcting  the  ammo  feed  chutes.  Improper  connection  of  these  chutes 
caused  many  stoppages  which  had  to  be  corrected  by  the  instructors  or,  in  a  few 
instances,  contractor  representations.  The  players  were  also  unable  to 
consistently  perform  the  misfire  and  immediate  action  procedures  in  the  proper 
sequence.  The  training  program  was  not  effective  in  promoting  the  retention  of 
the  skills  they  learned  in  the  performance  of  loading  and  mi  sfi re/ Immediate 
action  tasks.  More  training  and/or  practice  time  is  needed  in  these  areas  to 
better  enable  the  soldier  to  retain  skills  gained  through  the  training  program. 

d.  Written  tests  given  during  the  training  and  at  the  end  of  the  FDTE 
indicated  that  the  players'  knowledge  of  the  CFV  and  related  tasks  was  not  as 
high  as  the  results  of  the  instructors  hands-on  tests  given  during  the  training 
would  suggest.  Also,  scores  on  the  two  skills  and  knowledge  tests  given  during 
the  training  decreased  in  the  areas  not  being  taught  during  the  interim  and 
Increased  in  areas  being  taught.  The  practice  which  the  players  received 
during  the  conduct  of  the  FDTE  did  not  Increase  the  players'  knowledge  of  the 
CFV  and  its  systems  since  scores  on  the  written  test  given  at  the  end  of  the 
FDTE  were  the  same  or  lower  than  those  on  the  same  test  given  at  the  end  of 
training. 

e.  The  players  engaged  and  hit  a  much  greater  percentage  of  targets 
during  the  COFT  training  than  during  the  COAX  and  25mm  live  firings.  This  is 
partially  due  to  the  absence  of  need  to  acquire  targets  on  the  COFT.  (Targets 
are  readily  visible  on  the  COFT  display)-  Another  factor  influencing  these 
differences  is  that  the  players,  during  the  COFT  training,  were  not 

limited  by  the  time  constraints  enforced  during  the  live  fire  exercise. 

f.  Comparisons  of  results  from  the  COFT  training  and  the  target 
acquisition  and  engagment  hands-on  test  given  at  the  end  of  training  showed  no 
significant  difference  in  mean  time  to  determine  range  to  the  target  (after 
gunner  acquisition  for  the  hands-on  test)  between  the  two.  A  significant 
difference  was  found  In  mean  time  to  engage  the  target  (after  gunner  acquisition 
for  the  hands-on  test)  with  the  players  engaging  targets  more  quickly  during 
the  hands-on  test.  This  could  be  the  result  of  inexperience  on  the  part  of  the 
players  when  the  COFT  engagments  used  in  the  comparisons  were  conducted.  A 
slgnf leant  positive  correlation  was  found  between  mean  engagement  times  of  COFT 
targets  and  corresponding  times  from  the  hands-on  test. 

g.  During  the  COFT  training,  all  COFT  vehicular  targets  were  engaged  within 
the  25mm  weapon  system,  even  at  ranges  of  3000  meters.  The  failure  of  the 
Instructors  to  correct  this  situation  probably  contributed  to  the  players' 
training  to  select  the  25mm  for  engaging  targets,  including  targets  identified 

as  tanks,  during  the  target  acquisition  and  engagement  hands-on  tests. 
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Privacy  Act  and  the  Data  Base: 


Implementation  of  the  Privacy  Act 


William  B.  Camm,  Staff  Assistant  for  Tests,  Technical  Information  Division, 
US  Amy  Research  Institute  for  the  Behavioral  and  Social  Sciences, 
Alexandria,  Virginia  22333 

The  legal  constraints  of  the  Privacy  Act  and  the  increased  legislative 
pressure  to  handle  greater  amounts  of  personal  data  faster  and  better 
pose  many  new  problems  for  social  science  research  in  the  US  Government. 
The  immediate,  intermediate,  and  long-*range  solutions  to  the  dilemma  can  be 
achieved  through  the  development  of  a  precautionary,  systematic  set  of 
collection  and  storage  procedures;  full  use  of  comprehensive  data  bases; 
and  the  Insulation  of  each  contributing  set  of  data. 
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This  paper  is  offered  as  a  contribution  to  the  current  discussion 
on  the  legal  constraints  of  the  Privacy  Act  of  1974  (P.  L.  93-579)  on  the 
pressing  requirement  that  the  Department  of  Defense  handle  greater  amounts 
of  complex  personnel  research  data  faster,  better,  and  at  minimal  cost. 
The  mandate  to  minimize  the  cost  of  collecting,  maintaining,  and  using 
personal  data  and,  at  the  same  time,  maximize  the  utility  of  the  collected 
data  is  articulated  in  the  Paperwork  Reduction  Act  of  1980  (P.  L.  96-511). 
The  act  became  effective  1  April  1981. 

The  technology  capable  of  cogent  management  of  information  resources 
is  here,  and  we  already  know  quite  a  bit  about  how  to  apply  it  in  terms  of 
cutting  costs,  enhancing  usefulness,  coordinating  and  sharing  common 
procedures,  and  Improving  service  to  management  and  the  user. 

The  problem,  however,  is  to  Insure  that  the  collection,  maintenance, 
use,  and  dissemination  of  personal  data  and  information  are  consistent 
with  the  Privacy  Act.  The  restrictions  and  limitations  imposed  by  the 
Privacy  Act  loom  large  as  a  potential  hindrance  to  effective  information 
resource  management  in  terms  of  information  control,  resource  contraints, 
cost  of  data,  added  hardware,  and  special  computer  programs.  To  date, 
neither  the  impact  of  the  new  wave  of  information  resource  management  on 
the  Privacy  Act  nor  the  constraints  of  the  Privacy  Act  on  the  Paperwork 
Reduction  Act  has  been  sorted  out.  The  Office  of  Management  and  Budget 
(OMB)  has  been  assigned  the  responsibility  for  providing  overall  direction 
in  the  development  and  implementation  of  policies,  principles,  standards, 
and  guidelines  in  all  areas  of  P.  L.  96-511.  Privacy  Act  enhancement  is  on 
the  OMB  agenda  for  April  1983. 

All  Federal  agencies  are  moving  ahead  with  their  own  interpretation 
of  P.  L.  96-511  with  the  expectation  that  the  resulting  implementation  will 
be  found  acceptable.  The  Office  of  the  Assistant  Secretary  of  Defense, 
Information  Control  Division,  has  established  DOD-wide  policy  to  insure 
compliance  with  P.  L.  96-511.  The  Army  has  established  an  Information 
Management  Office  under  the  Office  of  the  Chief  of  Staff  to  define,  de¬ 
velop,  and  manage  the  Army  information  resources  program. 

Organizations  that  deal  with  personal  information  have  reached  informal 
agreements  on  most  aspects  of  the  program  but  have  avoided  special  problems 
with  regard  to  processing  and  maintaining  personal  data --especially  with 
the  concept  of  an  integrated  data  base.  Nevertheless,  the  problems  of 
effective  information  resource  management  and  protection  of  individual 
privacy  are  quite  real  and  very  pressing  and  cannot  be  ignored. 


THE  PRIVACY  ACT 

The  Privacy  Act  is  Imposed  on  executive  departments,  military  depart¬ 
ments,  Government  and  Government -controlled  corporations,  other  establish¬ 
ments  in  the  executive  branch  including  the  Office  of  the  President,  and 
independent  regulatory  agencies.  Congress  and  its  agencies  (e.g. ,  GAO)  are 
exempt.  So  are  Federal  Courts.  It  limits  the  manner  in  which  the/  col¬ 
lect,  use  and  disclose  information  about  people.  The  act  was  codified  as  5 
USC  552a  in  1976-  The  act  gives  the  individual  the  right  to  be  protected 
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against  the  power  of  officials  with  access  to  data  banks.  There  are  three 
related  aspects  to  the  Privacy  Act  rights: 


Personal  autonomy — the  right  to  make  a  choice  about  personal  be¬ 
havior  and  lifestyle. 

Freedom  from  outside  interference — the  right  to  be  left  alone. 

Protection  of  private  information--the  right  to  control  where  and 
how  information  about  oneself  is  communicated  to  others. 

This  portion  of  the  paper  focuses  on  the  third  point,  control  and  pro¬ 
tection  of  personal  information. 

Since  1965,  personal  privacy  has  become  an  important  social  value, 
covered  under  tort  laws,  in  the  United  States.  Privacy  is  related  to 
personal  freedom  and,  although  rights  of  privacy  are  not  expressly  men¬ 
tioned  in  the  Constitution,  is  supported  by  the  Supreme  Court’s  language 
used  in  most  of  its  important  decisions.  The  constitutional  amendments 
most  commonly  cited  in  this  regard  by  the  Supreme  Court  are  the  first, 
third,  fourth,  fifth,  ninth,  and  fourteenth. 

In  1976,  an  inventory  of  Federal  data  systems  revealed  that  97  agencies 
had  a  ::otal  of  7,000  records  systems  containing  nearly  4  billion  dossiers. 
The  Department  of  Defense  alone  had  2,219  systems  with  321  million  dif¬ 
ferent  names  and  records  (0MB,  1976).  Most  of  the  records  systems  at 

that  time  were  not  a  matter  of  public  record.  The  Privacy  Act  prohibits 
secret  files  and  further  stateB  that  individuals  should  be  able  to  find  out 
what  information  about  them  is  contained  in  Federal  records  and  how  that 
Information  is  used.  For  example,  a  person  is  able  to  prevent  personal 
Information  that  was  given  for  one  specific  purpose  from  being  used  for 
another  purpose  without  his  or  her  consent.  Provisions  will  be  made  for 
the  individual  to  correct  and  amend  personal  records  in  possession  of  the 
government. 

Government  agencies  handling  identifiable  personal  data  should  show 
that  such  data  are  reliable  and  current  and  take  positive  steps  to  prevent 
their  misuse.  Collected  data  should  also  be  safeguarded  and  securely 
stored  if  they  contain  identifiable  information.  For  research  use,  the 
connection  between  the  names  and  data  should  be  destroyed  when  no  longer 
needed.  Code  numbers  and  code  words  can  be  used  if  several  sets  of  data 
are  collected  on  the  same  person.  A  number  of  methods  for  storing  personal 
data  are  described  in  the  literature  (Boruch,  1971a,  b).  If  knowledge  of 
illegal  activities  is  requested,  anonymity  should  be  guaranteed  so  that  the 
data  cannot  be  subpoenaed  in  legal  proceedings.  Insulated  data  banks  might 
be  considered.  Research  data  are  not  automatically  privileged  information. 
There  are  a  few  exceptions,  such  as  data  regarding  drug  research.  Congress 
and  courts  may,  and  often  do,  subpoena  such  data. 

The  success  of  the  Privacy  Act  is  hard  to  measure  objectively.  The 
enforcement  of  data  protection  regulations  and  the  supervision  and  control 
of  the  collection  and  storage  of  information  about  individuals  depend, 
for  the  most  part,  on  the  good  faith  of  the  agencies  and  legal  action  by 
individuals.  Congress  believed  that  self -regulation  was  the  best  initial 
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method  for  control  because  It  eliminated  the  need  for  an  additional  govern¬ 
ment  agency  and,  at  the  same  time,  would  aid  the  necessary  advance  in  the 
technology  of  Information  collection  and  storage.  Control  agencies  were 
not  to  be  considered  unless  the  agencies  themselves  proved  that  self- 
regulation  had  failed.  The  potential  for  frustration  of  the  law  is  so 
great  that  the  Privacy  Protection  Study  Commission  (1971)  recommended  that 
the  Privacy  Act  be  broadened  to  Include  all  items  that  an  agency  can 
readily  Identify  in  all  of  its  systems  to  Insure  compliance  with  the 
Privacy  Act.  So  far,  Congress  has  not  implemented  the  Commission's 
recommends tion. 

Violations  of  the  Privacy  Act  are  misdemeanors  subject  to  a  maximum 
fine  of  $5,000.  Unlike  damage  actions  brought  against  an  agency,  criminal 
penalities  are  Imposed  on  Che  person  who  committed  the  crime.  The  punish¬ 
ment,  if  there  is  a  conviction,  is  applied  Co  any 

Agency  officer  or  employee  who  knowingly  or  willfully  makes  im¬ 
proper  disclosures  of  information  pertaining  to  an  Individual. 

Agency  officer  or  employee  who  willfully  maintains  records  without 
meeting  Notice  Requirements  Requests  (i.e.,  maintains  a  secret 
system  of  records). 

Person  who  knowingly  and  willfully  requests  or  obtains  individual 
records  from  an  agency  under  false  pretenses. 

If  the  court  finds  that  an  agency  (its  officers  or  employees)  acted  in 
an  intentional  or  willful  manner,  the  complainant  may  receive  actual 
damages  ($1,000  minimum).  But  it  is  difficult,  in  moBt  cases,  for  the 
complainant  to  show  proof  of  intentional  and  willful  agency  misconduct. 
The  complainant  must  also  show  that  the  conduct  was  greater  than  "gross 
negligence";  "ordinary  negligence"  on  the  part  of  the  agency  does  not  meet 
requirements  of  the  law  as  It  Is  written.  In  addition,  the  complainant 
must  also  prove  actual  damages  by  establishing  that  the  agency's  action  had 
a  direct  adverse  impact  upon  him  or  her.  Finally,  if  the  individual 
wins,  the  U.S.  Treasury  (not  the  agency  or  itB  members)  is  liable  for  the 
actual  damages,  court  costs,  and  attorney  fees.  This  situation  tends  to 
dampen  the  deterrent  effect  that  civil  actions  may  have  upon  data  col¬ 
lection  practices  of  agencies  (Bushkin  &  Schaen,  1975). 


THE  DATA  BASE 

A  data  base  may  be  viewed  as  a  digital  computer  version  of  a  manual 
file  system.  The  manual  file  system  comprises  file  folders  identified  by  a 
name  or  number.  The  computer  file  consists  of  records,  each  Identified  by 
a  primary  key  and  secondary  keys,  for  example,  name,  age,  rank,  and  Social 
Security  number.  At  this  point,  the  computerized  record  system  departs 
from  the  manual  system.  Access  to  the  Items  in  the  computerized  system  can 
be  made  through  the  primary  or  any  secondary  key,  or  through  any  other 
indicator  in  the  individual  record.  Users  of  computerized  records  systems 
are  often  in  remote  locations,  and  restrictions,  like  code  names  for  the 
primary  key  or  identification  tab  of  a  single  system,  no  longer  exist. 
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The  recent  trend,  under  the  Impetus  of  the  Paperwork  Reduction  Act,  is 
toward  integrated  data  bases  where  a  collection  of  data  or  records  is 
linked  together  using  a  common  identification  key.  The  reason  for  the 
innovation  is  related  to  a  greater  need  for  individualized  information  and 
a  growing  proficiency  in  processing  and  interpreting  data.  Also,  as 
expected,  data  collected  for  one  purpose  is  frequently  useful  for  related 
purposes. 

At  this  point,  the  distinction  between  records  that  relate  to  indi¬ 
viduals  for  the  purpose  of  taking  some  sort  of  action  concerning  that 
individual  and  records  that  are  collected  and  maintained  for  the  purpose  of 
planning  and  policy  decisions  should  be  made.  The  former,  in  the  strict 
sense,  is  termed  a  system  of  records;  the  latter  is  statistical  record* 
However,  most  records  are  mixed,  and  it  is  rare  to  find  a  true  statistical 
record  in  either  Government  or  academic  research.  A  true  statistical  data 
base  cannot  contain  information  that  can  be  related  to  an  identified  indi¬ 
vidual,  and  no  individual  contributing  to  the  data  base  should  be  identi¬ 
fied  with  it.*.  The  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences  (ARI)  collects  and  maintains  systems  of  records  until  such 
time  as  the  data  are  edited,  coded,  stripped  of  the  personal  identifi¬ 
cation,  and  entered  into  the  data  base.  The  ARI  Systems  Notice  (ARI,  1980) 
covers,  at  this  writing,  all  ARI  systems  of  records  of  the  moment  and  the 
future,  provided  the  data  collection  effort  remains  within  the  operational 
confines  of  the  public  notice.  If  a  new  and  different  system  of  records  is 
contemplated,  then  an  additional  notice,  or  modification  of  the  current 
notice,  will  be  required.  The  new  notice  must  be  published  in  the  Federal 
Register  at  least  90  days  prior  to  any  data  collection  for  the  new  system 
of  records. 


DATA  COLLECTION  AND  STORAGE  PROCEDURES 

The  procedure  involved  from  the  start  of  the  data  gathering  through  the 
final  destruction  of  the  system  of  records  (i.e.„  removal  of  personal 
identifiers)  and  the  publication  of  the  results  for  the  various  users  may 
theoretically  be  compromised  at  a  number  of  points  during  the  collection, 
transmission,  storage,  and  processing  of  the  data.  Nine  arbitrary  points 
are  conceptualized  here  for  the  purpose  of  illustration  in  Figure  1. 

The  data  collection  point  1  surveys,  questionnaires,  tests,  inter¬ 
views,  or  ratings  is  obvious  and  frequently  overlooked  despite  the  Privacy 
Act  Statement  at  that  point  stating  that  "Full  confidentiality  of  the 
responses  will  be  maintained  in  the  processing  of  the  data*  ..."  (DA  Fora 
4368-R).  The  Privacy  Act  requires  that  all  agencies  involved  In  data 
collection--in  the  development  of  a  data  base  there  may  be  several --provide 


^Insofar  as  the  Privacy  Act  is  concerned,  however,  the  only  operative 
criterion  is  whether  or  not  the  agency  does  in  practice  retrieve  the 
information  by  reference  to  some  personal  identifier. 
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appropriate  administrative,  technical,  arid  physical  safeguards-  The  ccm son 
threat  to  personal  information  at  point  1  is  the  person  who  is  authorized 
to  have  access  to  the  Information  for  one  purpose  but  who  misuses  that  same 
information  for  an  unauthorized  purpose.  The  entire  data  collection 
operation,  if  possible,  should  remain  under  a  single  work  group.  It 
is  tempting  for  the  researcher  to  ignore  most  of  the  problems  at  point  1 
and  go  on  to  the  second  potential  compromise  point  (transmission).  The 
personal  information  is  easiest  to  protect  in  the  computer-based  area. 
The  transmission  of  the  data  (point  2)  may  be  by  messenger,  mail,  tele¬ 
phone,  or  microwave  and  is  subject  to  compromise  during  transmission  and 
upon  receipt.  Any  privacy  compromise  here  is  seldom  intended  and  is  most 
likely  the  result  of  careless  handling.  Security  compromise  during  trans¬ 
mission  is  not  specifically  treated  in  this  paper.  The  editing  and  coding 
process  (point  3)  is  the  first  step  in  preparing  the  data  for  the  computer 
and  is  the  time  to  check  for  accuracy,  relevance,  timeliness,  and  complete¬ 
ness.  It  is  also  a  good  time  to  remove  the  personal  identification  in 
preparation  for  linkage  with  additional  information  In  the  integrated  data 
base  unless  that  linkage  is  necessary  for  the  subsequent  Interpretation  of 
the  data.  Data  transmission  (point  4)  to  the  computer  area  is  usually  less 
of  a  risk  than  point  2.  However,  the  information  must  be  checked,  edited, 
and  sorted  and  may  easily  be  identified  by  resourceful  people.  Points  5, 
6,  and  7  involve  checking  the  processing  of  the  data  being  edited  and 
stored.  During  format  checks  of  tables,  graphs,  and  the  like,  careless 
handling  may  result  in  compromise.  Error  listings  are  another  source  of 
compromise  at  this  point.  The  location  of  each  item  of  information  should 
be  recorded  and  confined  to  the  computer  area;  extraneous  data  should  be 
destroyed  when  no  longer  useful.  Finally,  point  8  is  transmission  of  the 
data  (the  report)  to  the  user,  point  9.  Exploitation  can  occur  when  common 
and  unique  properties  of  individuals  are  displayed  in  the  reports.  It  Is 
then  a  simple  matter  to  sort,  count,  and  Identify  individuals  and/or  groups 
from  the  final  report.  For  example,  tabulation  of  results  may  yield  grade 
level,  age,  sex,  location,  and  other  properties  that  with  cross-tabu¬ 
lations  identify  individuals  and/or  groups. 


1. 

Data  Collection 

6. 

2. 

Data  Transmission 

7. 

3. 

Editing  and  Coding 

8. 

4. 

Data  Transmission 

9. 

5. 

Data  Preparation 

Figure 

1.  Flow  from  personnel 

i nformati 

record  to  final  report, 
points . 


Computer  Processing  and  Storage 
Tabulation  and  Display  of  Results 
Data  Transmission 
Report:  The  results  to  user 


lumbers  represent  potential  compromise 


In  practice*  the  situation  is  acre  complicated*  Longitudinal  studies 
which  involve  collecting  and  maintaining  information  over  a  period  of  tine 
aay  present  problems*  A  statistical  data  base  of  this  sort  needs  an 
insulated  method  of  linking  recent  data  with  data  already  stored.  To 
complicate  matters,  a  secondary  ueer  or  users  are  often  involved.  And  most 
problems  arise--at  least  insofar  as  privacy  safeguards  are  concerned "-when 
the  primary  user  establishes  the  data  base  for  administrative  purposes  and 
the  secondary  user  is  more  interested  In  research,  or  vice  versa.  Often, 
there  is  no  relationship  of  purpose  between  the  records  system  of  one  user 
and  the  established  data  base  of  another. 


PRIVACY  SAFEGUARDS 

Privacy  safeguards  for  data  bases  are  similar  to  those  required  for 
most  records  systems.  Certain  data  bases,  for  example  those  concerned  with 
current  sensitive  issues,  such  as  medical  histories,  performance  by  ethnic 
groups,  illegal  actions  or  country  of  origin  (Barnes,  1979),  are  subject  to 
intentional  invasion  for  several  reasors  by  individuals  whose  Interests 
range  from  apprehension  concerning  possible  misuse,  real  or  imagined,  of 
the  information  contained  in  the  data  base  to  intelligence-gathering 
activities  of  foreign  governments.  Added  precautions  might  be  considered. 

For  example,  the  data  from  MILPERCEN* s  proposed  data  base  are  coded 
with  a  cryptographic  code  known  only  to  MILPERCEN.  The  coded  data  plus 
identifying  Information  are  sent  to  A£1  to  merge  with  ASVAB  data,  which  ia 
also  coded  using  the  identifying  information  to  link  with  the  MILPERCEN 
record  (Figure  2).  The  identification  Is  then  deleted.  The  merged  file  is 
given  to  ARl's  Personnel  Utilization  Technical  Area.  MILPERCEN  cannot 
obtain  anything  other  than  their  own  data  from  the  file,  and  AP.l  cannot 
meaningfully  Identify  data  from  MILPERCEN  but  will  have  the  necessary 
information  for  a  validation  of  ASVAB.  The  same  scheme  can  be  used  in 
longitudinal  studies  with  different  Independent  groups.  The  code  linkage 
can  either  be  destroyed  or  stored  in  a  safe  place  beyond  the  reach  of  all 
but  extraordinary  requests. 

Assuming  reasonable  precaution  in  data  collection,  maintenance,  stor¬ 
age,  and  reporting,  the  insulated  data  base  with  its  disposable  code  links 
and  the  resulting  statistical  record  will  easily  meet  future  requirements 
for  privacy  protection  of  ARI  integrated  data  bases.  There  are  many  other 
effective  methods  to  insulate  and  link  record  systems.  There  is  no  one 
best  way  to  protect  personal  information.  The  point  is  that  such  protec¬ 
tion  can  and  should  be  provided. 


1.  MILPERCEN  Data --Coded 

2.  Data  Transmission 

3.  ARI  ASVAB  Data--Coded 

4.  Data  Transmission 

5.  Merge  Data--£dit  and  Match 
Codes 

6.  Data  Transmission  of  All 
Coded  and  Merged  Data 


7.  The  Data  Base--Statistical 
Records  only 

8.  Transmission  of  Cryptographic 
Key 

9.  Safe  Storage  of  Code  Key 


Figure  2 


Schematic  flow  and  proposed  development  of  one  insulated  data 
base. 
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Swnry.  •/Two  studies  were  conducted  to  determine  the  relative  training  and 
cost  effectiveness  of  two  simulated  versions  of  the  F— 1 1 1  Converter/Flight 
Control  System  maintenance  test  station  as  compared  to  the  operational  test 
station  equipment.  Study  1  was  designed  to  assess  classroom  and  field  perform¬ 
ance  of  students  trained  on  the  actual  test  station  and  those  trained  on  the 
6883  3-dimensional  simulator.  Further,  the  life  cycle  costs  of  using  the  two 
systems  for  training  were  compared.  The  results  of  Study  1  indicated  that 
there  were  no  significant  differences  in  either  classroom  or  field  performance 
as  a  function  of  training  equipment.  However,  the  comparison  of  system  costs 
indicated  that  ictual  equipment  costs  were  approximately  twice  those  of  the 
6883  3-dimensional  simulator  when  calculated  over  a  15-year  life  cycle.  Study 
2  was  designed  to  make  similar  performance  and  cost  comparisons  among  the  3- 
dimensional  simulator,  a  2-dimensional  simulator,  and  the  operational  test 
station  equipment.  The  preliminary  results  of  this  study  show  that  there  were 
no  significant  differences  in  student  performance  as  a  function  of  the  training 
device  employed.  The  life  cycle  cost  commparison  among  all  three  training 
devices  is  not  yet  complete.  \ 


Introduction 


Traditionally,  hands-on  training  in  the  maintenance  of  the  F-111 
Converter/Flight  Control  System  has  been  provided  to  avionics  maintenance  train¬ 
ees  using  operational  6883  test  station  equipment.  In  fact,  the  use  of  opera¬ 
tional  test  station  equipment  has  been  standard  procedure  throughout  the  Air 
Force  aircraft  maintenance  training  schools.  However,  in  the  early  1970s,  the 
need  for  more  cost  effective  trainers  became  apparent.  Miller  (1974)  and  Miller 
and  Gardner  ( 1975)  provided  an  extensive  analysis  of  the  need  for  simulated 
training  in  general  and,  specifically,  for  the  development  of  the  6883  3- 
dimensional  (3-D)  simulator  employed  In  part  of  this  evaluation.  Factors  in¬ 
fluencing  the  decision  to  proceed  with  the  development  and  implementation  of 
6883  simulators  included  the  high  cost  and  low  reliability  of  actual  equipment 
trainers  (AET),  safety  factors,  high  noise  levels  in  AET  work  areas,  and  the 
limited  scope  of  training  that  can  be  accomplished  on  AETs  because  appropriate 
malfunctions  cannot  be  inserted  without  extensive  and  costly  modifications. 

Unfortunately,  methodologically  sound  comparative  studies  of  the 
training  and  cost  effectiveness  of  simulators  and  actual  equipment  used  for 
training  are  conspicuously  rare  and  available  literature  on  the  use  of  simula¬ 
tors  specifically  for  maintenance  training  is  even  more  limited.  Of  the  studies 


1This  work  is  being  conducted  by  the  Denver  Research  Institute  for 
the  Air  Force  Human  Resources  Laboratory  at  Lowry  AFB  under  contract  No. 
F33615-78-C-0018.  A  more  detailed  discussion  of  Study  1  is  provided  in  report 
number  AFHRL-TR-80-24. 
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which  have  been  conducted,  two  important  points  have  been  established:  main¬ 
tenance  training  simulators  can  be  cost  effective  (Miller,  1978;  Fink, 
Shriver,  Downing,  4  Miller,  1978;  Montemerlo,  1977),  and  maintenance  training 
simulators  can  provide  training  at  least  comparable  to  the  actual  equipment 
(Miller,  1974;  Hurlock  &  Slough,  1976;  Daniels,  Datta,  Gardner,  &  Modrick, 
1975). 


While  only  a  few  simulator  maintenance  training  programs  have  been 
in  operation  long  enough  to  assess  implementation  effects,  some  problems  have 
become  apparent.  First,  there  is  the  need  to  assess  the  fidelity  of  the 
training  device  in  order  to  establish  the  equivalency  of  training  contexts. 
Otherwise,  the  evaluation  design  may  incorporate  different  training  and 
testing  contexts  and  render  the  results  inconclusive.  Special  attention  must 
be  given  to  psychological  fidelity  since  the  level  of  realism  can  have 
important  impacts  on  training  effectiveness.  Second,  there  is  the  need  for  a 
criterion  measure  of  training  effectiveness.  Previous  researchers  have  had 
difficulty  in  establishing  such  a  measure  without  incurring  high  costs 
(Miller,  1978).  And  finally,  problems  can  arise  as  a  result  of  instructor 
opposition.  Instructors  may  see  their  teaching  role  threatened  by  the 
incorporation  of  simulation  methods  or  may  disagree  with  required  changes  in 
the  academic  structure. 

The  present  studies  necessarily  involved  some  of  these  issues.  It 
was  important  to  consider  the  role  of  the  simulated  devices  in  training,  their 
fidelity  to  AETs,  and  user  acceptance  of  the  new  equipment.  Furthermore,  the 
cost  analysis  of  the  simulators  and  AET  station  forced  examination  of  concepts 
and  assumptions  that  link  training  effectiveness  with  the  respective  system 
costs. 


Study  1 


The  overall  objective  of  Study  1  was  to  design  and  implement  a  com¬ 
prehensive  comparative  cost  and  training  effectiveness  evaluation  between  the 
6883  3-D  simulator  and  the  operational  6883  test  station  equipment. 


Research  Design 

This  study  was  designed  to  examine  three  major  questions:  (1)  the 
relationship  between  mode  of  training  and  classroom  performance,  (2)  the  rela¬ 
tionship  between  mode  of  training  and  job  proficiency  in  the  field,  and  (3) 
the  relative  costs  of  using  the  two  devices  for  maintenance  training.  The 
basic  design  used  to  assess  classroom  and  field  performance  is  shown  in  Figure 
1. 


For  the  purpose  of  assessing  classroom  performance,  four  experi¬ 
mental  groups  were  defined  by  the  two  training  modes  and  two  testing  modes 
(Groups  A-D) .  Clearly,  performance  differences  in  operating  and  maintaining 
actual  test  stations  as  a  function  of  the  training  equipment  was  of  primary 
interest.  However,  two  testing  modes,  actual  and  simulator  equipment,  were 
used  because  it  was  expected  that  the  simulator  might  provide  training  or 
testing  capabilities  which  were  not  available  on  the  actual  equipment.  Fur¬ 
ther,  it  was  necessary  to  determine  the  extent  to  which  any  observed  differ¬ 
ences  in  performance  were  due  to  familiarity  with  the  test  equipment. 


TRAINING 


Simulator  Actual  Equipment 


Simulator 


TESTING  Actual  Equipment 


FIELD 

ASSIGNMENT 


Figure  1.  Research  Design  Used  for  Comparing  Field  Performance. 


As  shown  in  Figure  1,  to  assess  the  impact  of  training  mode  on  Job 
performance  in  the  field,  it  was  necessary  to  consider  eight  experimental  groups. 
That  is,  the  assessment  of  field  performance  was  conducted  in  view  of  the  four 
levels  of  training  resulting  from  the  various  combinations  of  classroom  train¬ 
ing  and  testing  modes. 

The  comparison  of  costs  associated  with  using  the  6883  actual  test 
station  and  the  6883  3-Dimensional  simulator  for  training  was  based  on  the 
"ingredients  approach"  discussed  by  Levin  (1975)  in  which  cost  elements  are 
identified  and  evaluated  consistent  with  the  ATC  acquisition  and  training  en¬ 
vironment.  The  cost  elements  or  ingredients  associated  with  each  cost  category 
were  evaluated  either  as  one-time  costs  (primarily  Investment  Costs)  or  as 
Recurring  Annual  Costs,  consistent  with  the  Air  Force  perspective  on  economic 
analysis  (Williams,  1977).  The  major  cost  categories  used  in  the  present  anal¬ 
ysis  were:  facilities,  equipment,  instructional  material/training,  personnel, 
students,  and  miscellaneous.  For  purposes  of  the  cost  analysis,  It  was  assumed 
that  both  trainers  had  equal  training  effectiveness  and  that  the  life  cycle 
cost  comparison  between  trainers  would  Indicate  which  device  was  the  most  cost 
effective,  i.e.,  the  trainer  exhibiting  the  least  total  cost  of  ownership. 

This  was  a  useful  approach  since  it  established  baseline  data  and  was  consis¬ 
tent  with  the  original  simulator  design  objective  of  developing  a  functional 
(training)  replacement  for  the  6883  test  station  (Miller  A  Gardner,  1975). 


Methodology 

.'  total  sample  of  115  F-111  avionics  maintenance  trainees  partici¬ 
pated  in  Study  1 .  Students  were  assigned  to  the  treatment  groups  defined  by 
training  sequence  and  the  two  levels  of  training  mode  and  test  mode.  Although 
training  sequence  was  initially  considered  in  the  assignment  strategy,  subse¬ 
quent  analysis  indicated  that  performance  did  not  vary  as  a  function  of  training 
sequence.  The  distribution  of  students  among  the  four  resulting  experimental 
groups  Is  shown  in  Figure  1 . 

The  assignment  of  students  to  groups  was  essentially  random.  Tests 
for  aptitude  and  prior  achievement  indicated  no  biases  across  groups.  Further, 
there  were  no  apparent  biases  in  assignment  with  regard  to  student  gender. 
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One  day  at  the  end  of  the  Converter/Flight  Control  Systems  practi¬ 
cal  block  of  instruction  was  made  available  for  data  collection*  This  temporary 
departure  from  the  normal  training  schedule,  effective  for  the  duration  of 
this  project,  allowed  data  to  be  collected  without  altering  the  usual  6883 
training  protocol.  In  order  to  collect  all  information  required  from  an  entire 
class  of  students  (usually  six  airmen)  in  a  single  day,  it  was  necessary  to 
administer  the  performance  tests  and  conduct  the  interviews  in  a  different 
order  for  each  student.  Administration  of  the  trouble-shooting  performance 
test  for  the  first  student  was  followed  by  the  personal  interview.  During  the 
time  that  any  one  student  was  being  administered  the  trouble-shooting  perform¬ 
ance  test,  the  remaining  students,  who  had  either  completed  or  were  awaiting 
their  turn  to  be  tested,  were  given  a  Projected  Job  Proficiency  Test.  When 
all  students  in  a  class  had  completed  the  performance  test,  the  Projected  Job 
Proficiency  Test,  and  had  been  interviewed,  the  evaluation  day  was  complete. 

In  addition  to  this  primary  data  collection  effort,  other  performance  data  was 
obtained  from  the  student  records.  Finally,  subsequent  to  placement  in  the 
field,  students  were  again  interviewed  concerning  their  training  experience 
and  supervisors  were  asked  to  rate  student  field  performance. 

Data  needed  for  the  cost  analysis  were  obtained  primarily  from  Air 
Force  acquisition  and  maintenance  records.  However,  it  was  also  necessary  to 
monitor  training  deviations  and  unscheduled  maintenance  of  equipment  throughout 
the  study  to  insure  an  accurate  estimate  of  operating  cost3  of  both  systems. 


Results 


A  central  issue  to  be  addressed  by  this  investigation  was  whether 
simulator  and  actual  equipment  training  would  result  in  equal  levels  of  student 
performance  on  a  practical  trouble-shooting  problem  as  might  be  encountered  in 
the  field.  The  test  which  was  developed  consisted  of  a  timed,  serial,  hands- 
on,  29  item  task.  The  test  allowed  three  overall  dependent  measures  of  perform¬ 
ance:  a  total  score,  the  total  time  necessary  for  test  completion,  and  a  three 
point  rating  of  the  degree  of  assistance  required  for  test  completion.  Table 
1  shows  the  mean  scores  for  each  of  the  experimental  groups  on  each  of  these 
three  measures. 


TABLE  1 

Overall  Trouble-Shooting  Test  Proficiency  as  a 
Function  of  Training  and  Testing  Modes 
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Completion  times  for  this  actual/actual  (training  mode/test  mode) 
versus  3imulator/actual  comparison  did  not  differ  significantly  (F( 1,105)  = 
2.193.  P  =  .19)  while  there  was  a  marginal  difference  in  total  test  scores 
(F(1,105)  =  2.903,  p  =  .09).  That  is,  actual -trained  students  scored  slightly 
higher  on  actual  equipment  testing  than  simulator-trained  students.  This  effect 
is  minor,  though,  and  may  be  the  result  of  machine-specific  experience  rather 
than  qualitative  differences  in  training.  It  seem3  entirely  plausible,  for 
example,  that  students  would  be  more  nervous  being  tested  on  the  real  test 
station  if  their  only  experience  had  been  with  a  simulator  where  the  ramifica¬ 
tion  of  errors  are  not  as  serious. 

A  brief  examination  of  the  degree  of  assistance  required  was  carried 
out  despite  this  measure's  lack  of  sensitivity.  The  group  means  reported  in 
Table  1  are  extremely  similar,  and  a  chi  square  analysis  showed  no  evidence 
that  ratings  were  related  to  experimental  conditions  (X2(6)  =  9.63,  P  =  .I1*). 

In  sum,  then,  simulator-  and  AET-trained  students  did  not  differ 
appreciably  with  respect  to  overall  trouble-shooting  ability  as  measured  by 
the  practical  test.  A  very  slight  advantage  in  test  accuracy  was  found  for 
actual-  as  opposed  to  simulator-trained  students  tested  on  the  actual  6883, 
but  this  finding  was  not  mirrored  using  corampletion  time  as  a  measure.  The 
simulator  proved  to  be  a  somewhat  slower  testing  device  which  caused  some  inter¬ 
ference  for  actual-trained  students. 

The  Projected  Job  Proficiency  Test,  a  multiple  answer  performance 
test,  also  indicated  no  significant  differences  among  groups  as  a  function  of 
training  mode.  Hence,  it  can  be  concluded  that  all  students,  regardless  of 
training  mode,  acquired  equivalent  job-related  experience.  In  an  effort  to 
assess  user  acceptance  of  the  simulator  as  an  alternative  training  device, 
both  students  and  instructors  were  asked  about  the  utility  and  adequacy  of  the 
equipment.  While  the  feedback  was  mixed,  generally  students  indicated  that, 
regardless  of  the  equipment  used,  the  training  experience  was  both  adequate 
and  enjoyable.  On  the  other  hand,  interviews  with  a  limited  number  of  instruc¬ 
tors  revealed  a  concern  about  the  slow  response  time  of  the  3-dimensional  sim¬ 
ulator  and  identified  the  need  for  more  sophisticated  courseware. 

The  results  of  the  life  cycle  cost  comparison  between  the  3- 
dimensional  simulator  and  the  actual  equipment  are  summarized  in  Table  2. 

From  Table  2  it  can  be  seen  that  when  both  Investment  Costs  and  Recurring  An¬ 
nual  Costs  are  considered,  the  actual  equipment  is  more  than  twice  as  expen¬ 
sive  to  acquire  and  operate  as  a  training  device.  Similarly,  even  if  Invest¬ 
ment  Costs  are  considered  as  sunk  costs,  the  cost  in  constant  dollars  of  oper¬ 
ating  the  actual  equipment  is  still  about  twice  that  of  the  simulator  $3,336,150 
and  $1,588,020  respectively. 

In  considering  the  policy  implications  of  a  2  to  1  cost  effective¬ 
ness  ratio  in  favor  of  the  6883  simulator,  it  is  important  to  realize  that  the 
estimate  for  the  AET  is  extremely  conservative.  The  costs  of  the  CENPAC  compu¬ 
ters  were  not  allocated  to  the  6883  test  station  and  no  cost  element  was  in¬ 
cluded  in  the  equipment  category  that  reflects  the  cost  of  installation  and 
start-up  of  the  AET.  Installation  and  start-up  cost3  for  the  simulator  were 
fully  allocated  and  these  costs  were  perhaps  unexpectedly  high  due  to  problems 
encountered  in  bringing  the  simulator  on-line. 


TABLE  2 

The  Life  Cycle  Cost  Comparison 


Cost  Categories 

Simulator 

AET 

Facilities  $  110,650 

Equipment  1, 594,330 
Instructional/Materials 
Personnel  94,250 

Students  357,770 

Miscellaneous 

$  110,650 

4,902,140 

26,000 

72,530 

357,770 

0 

27,890 

0 

TOTAL 

$2, 183,000 

$5,470,980 

NPV  (1978) 

$1,501,090 

$3,895,680 

$/student-hour 

$348  =  $23/ 

15  yrs  student-hr 

$902  =  $60/ 

15  yrs  student-hr 

Study  2 


To  more  fully  explore  the  use  of  alternative  training  devices,  a  2- 
dimensional  6883  simulator  was  designed  and  developed.  This  training  simulator 
is  significantly  different  from  the  3~dimensional  simulator  evaluated  in  Study 
1  in  a  number  of  important  ways.  First,  the  2-dimensional  device  is  composed 
of  four  components,  or  part  task  trainers  (PTT) ,  which  can  be  used  for  instruc¬ 
tion  in  either  an  integrated  system  approach  of  in  an  individual  component 
approach.  The  four  components  are  a  logic  trainer,  a  switching  complex  trainer, 
an  O-scope  trainer  and  a  flat-panel  trainer.  Secondly,  the  design  of  the  2- 
dimensional  simulator  incorporates  less  physical  fidelity  to  the  actual  equip¬ 
ment  than  the  3-dimensional  simulator.  Finally,  the  switching  complex  trainer, 
a  component  of  the  2-dimensional  system,  provides  training  which  is  not  avail¬ 
able  on  the  3-dimensional  simulator  and  which  has  previously  been  taught  only 
on  the  actual  test  station  equipment  or  using  more  traditional  classroom  methods 
(e.g.,  black  board). 

The  purpose  of  Study  2  was  to  incorporate  this  2-dimensional  simula¬ 
tor  into  the  training  and  cost  comparison  conducted  in  Study  1 .  Initially,  it 
was  expected  that  the  performance  and  cost  data  relative  to  the  2-dimensional 
simulator  could  be  easily  integrated  with  that  for  the  3-dimensional  simulator 
and  actual  equipment  already  collected  in  Study  1.  However,  it  became  apparent 
that  this  would  not  be  possible.  Concurrent  with  the  delivery  of  the  2- 
dimensional  simulator,  several  course  objective  and  training  format  changes 
occurred  which  resulted  in  dramatic  modifications  to  the  Converter/Flight  Con¬ 
trol  System  blocks  of  instruction.  Thus,  it  was  necessary  to  collect  perform¬ 
ance  data  simultaneously  for  all  three  training  devices. 

Since  previously  collected  performance  data  could  not  be  utilized 
in  Study  2,  a  decision  was  made  to  use  this  opportunity  to  modify  the  data 
collection  instruments  used  in  Studv  1  and  to  incorporate  a  preassessment  test 
package  into  Study  2. 


Research  Design 


Figure  2  shows  the  research  design  employed  to  compare  student  Der- 
formance  across  the  three  training  devices. 


Testing  Mode 


AET 

3-D 

2-D 

AET 

A 

3-D 

B 

E 

2-D 

C 

F 

3-D/PPT 

D 

(X) 

(Y) 

Figure  2.  Study  2  Research  Design 


Since  the  2-dimensional  simulator  was  developed  with  peripheral  part-task 
trainers  (PTT),  one  of  which  was  used  to  provide  patch-panel  training  for  3- 
dimensional  simulator  training,  the  3-D/PPT  training  mode  was  added  to  the 
design.  In  sum,  the  Study  2  evaluation  effort  was  designed  to  analyze  only 
six  of  the  twelve  possible  configurations  due  to  a  limited  anticipated  supply 
of  students  during  the  course  of  the  evaluation.  Also,  since  subsequent  equip¬ 
ment  malfunctions  necessitated  reassignment  of  experimental  groups,  not  all 
matrix  cells  could  be  filled  as  anticipated,  and  two  additional  cells  were 
included  (X  and  Y  in  Figure  2),  although  sample  sizes  in  these  cells  remain 
insignificant. 


Methodology 


A  total  of  119  students  have  been  tested  to  date  in  Study  2.  A 
major  effort  was  devoted  to  redesigning  student  performance,  user  acceptance 
and  equipment  monitoring  data  collection  forms  based  on  observations  made  during 
Study  1 .  A  preassessment  test  package  consisting  of  three  aptitude  tests  was 
also  included  in  this  study  and  was  administered  to  all  students  at  the  begin¬ 
ning  of  the  6883  instructional  block.  This  preassessment,  including  the  Delta 
Concealed  Figures,  the  Delta  Reading  Vocabulary  Test,  and  the  Ship  Destination 
Test  was  designed  to  assess  the  spatial,  verbal,  and  logical  abilities  of  stu¬ 
dents.  The  purpose  of  this  package  of  tests  at  the  initiation  of  6883  training 
was  to  ascertain  if  differences  in  student  aptitude  exist  which  could  influence 
subsequent  performance  evaluation. 

The  two  major  Study  1  performance  tests,  the  Hands-On  Trouble-Shooting 
Test  and  the  Projected  Job  Proficiency  Test,  were  redesigned  to  facilitate  a 
more  rigorous  item  analysis  and  to  reduce  the  overall  amount  of  time  needed 
for  data  collection.  An  additional  Paper  and  Pencil  Trouble-Shooting  Test  was 
incorporated  into  the  performance  measure  package  to  assess  students'  knowledge 
of  the  electronics  system  and  their  ability  to  follow  TO  diagrams  in  the  identi¬ 
fication  of  electronic  failures  and  malfunctions.  The  Student  Interview,  In¬ 
structor  Survey  and  Training  Monitoring  forms  were  also  redesigned  to  simplify 
implementation.  The  field  follow-up  forms  remained  essentially  the  same  as 
those  used  in  Study  1 . 
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Results 


Study  2  analyses  are  preliminary  at  this  time  since  the  contract 
period  will  not  expire  until  December  30,  1981,  and  data  collection  will  con¬ 
tinue  through  mid-October  1981.  Table  3  shows  the  means  for  the  four  groups 
on  preassessment  measures. 


TABLE  3 


Preassessraent  Measures  for  the  Four  Experimental  Groups 


Training  Mode 


AET 

2-D 

3-D 

3-D/PPT 

Total 

Hidden  Figures  x 

5.60 

5.58 

4.58 

4.62 

5.20 

(n) 

(35) 

(36) 

(33) 

(13) 

(117) 

Vocabulary  x 

26.49 

24.03 

27.79 

26.92 

26.15 

(n) 

(35) 

(36) 

(33) 

(13) 

(117) 

Ship's  Destination  x 

35.51 

36.61 

35.27 

33-83 

35.61 

(35) 

(36) 

(33) 

(12) 

(116) 

A  preliminary  analysis  indicated  that  there  were  no  significant 
differences  in  the  preassessoent  scores  among  the  four  experimental  groups. 
Specifically,  when  all  four  levels  of  training  were  included  in  a  series  of 
one-way  analyses  of  variance,  no  significant  differences  among  means  were  found 
for  any  of  the  preassessment  tests.  However,  when  the  3-D/PPT  group  was  ignored 
a  marginally  significant  effect  of  training  mode  was  found  for  the  vocabulary 
measure  (F{2,101)  =  2.60,  p  =  .079)-  Table  4  shows  the  means  for  the  four 
groups  for  each  of  the  three  performance  measures  used. 


TABLE  4 


Performance  Measures  for  the  Four  Experimental  Groups 


Training  Mode 


AET 

2-D 

3-D 

3-D/PPT 

Total 

Hands-on  Score  x 

(n) 

PJPT  Score  x 

(n) 

PAPTS  Score  x 

(n) 

21.50 

(34) 
20.77 

(35) 
19.29 

(3D 

22.53 

(34) 

20.58 

(36) 

19-47 

(36) 

21.74 

(35) 

20.85 

(35) 

17.52 

(29) 

■ 

21.91 

(116) 

20.79 

(119) 

19.11 

(1C9) 

When  all  four  levels  of  training  were  submitted  to  one-way  analyses 
of  variance,  no  significant  differences  among  means  were  found  for  anv  of  the 
three  measures.  When  the  3-D/PPT  group  was  not  included  in  the  analysis,  a 
marginally  significant  effect  of  training  was  found  for  the  hands-on  test  scores 
(F(2,100)  =  2.51,  p  =  .087).  However,  it  should  be  stressed  that  this  analysis 
collapsed  student  groups  across  testing  modes  and  that  being  trained  and  tested 
on  different  systems  may  result  in  confounding  of  performance  measures  due  to 
unfamiliarity  with  the  equipment  used  for  testing.  In  fact,  an  analysis 


involving  only  those  groups  of  students  who  were  trained  and  tested  on  the 
same  equipment  revealed  no  significant  differences  in  performance  on  any  of 
the  three  measures, 

Unfortunately,  at  this  time  the  data  required  to  compare  perform¬ 
ance  as  a  function  of  training  equipment,  when  only  the  actual  equipment  is 
used  for  testing,  are  unavailable.  This  information  will,  of  course,  be  a 
major  component  of  the  final  analysis  of  Study  2  data. 

Finally,  while  the  life  cycle  cost  comparison  among  the  three  train¬ 
ing  devices  has  not  been  completed,  it  is  clear  that  the  2-dimensional  simulator 
will  be  significantly  more  cost  effective  than  the  actual  equipment.  However, 
the  relative  cost  effectiveness  of  the  two  simulated  test  station  systems  is 
not  known  at  this  time. 


Discussion 


Based  on  the  results  of  Study  1  and  the  preliminay  analysis  of  Study 
2  data,  it  can  be  concluded  that  students  trained  on  the  6883  3-D  and  2-D  sim¬ 
ulators  performed  as  well  as  students  trained  on  actual  equipment.  It  could 
be  argued,  however,  that  the  primary  benefits  of  employing  simulated  trainers 
were  simply  not  realized.  It  is  often  assumed  that  simulators  designed  to 
replace  the  more  costly,  less  reliable,  and  more  dangerous  actual  equipment 
trainers  must  maintain  a  high  level  of  physical  and  psychological  fidelity. 

This  assumption  stems  from  the  fact  that  the  simulators  are  usually  integrated 
into  existing  curricula  and  are  generally  used  by  instructors  in  a  manner  identi¬ 
cal  to  the  way  the  actual  equipment  had  been  used.  Given  this  limited  perspec¬ 
tive,  it  is  not  surprising  to  observe  equivalent  student  performance  when  em¬ 
ploying  simulated  and  actual  equipment  trainers.  Hence,  the  outcome  of  cost 
comparisons  is  likely  to  become  the  major  factor  directing  future  procurement 
decisions. 


It  would  seem  that  a  decision  to  use  simulators  as  supplements  to 
actual  equipment  trainers  would  allow  more  flexibility  in  their  design,  since 
the  actual  equipment  would  be  retained  to  insure  compliance  with  existing 
training  requirements.  If  simulated  and  actual  equipment  trainers  are  used  in 
conjunction,  improvement  in  performance  must  be  demonstrated  to  justify  the 
obvious  additional  cost  of  acquiring  and  maintaining  two  training  devices. 

In  addition  to  performance  and  cost  considerations,  c*her  factors 
may  play  a  major  role  in  the  decision  to  employ  simulated  test  stations  as 
trainers.  For  example,  it  is  unlikely  that  a  simulator  of  any  quality  will  be 
accepted  into  existing  training  curricula  if  it  is  not  somewhat  consistent 
with  established  instructional  practices.  To  encourage  instructor  acceptance, 
the  simulator  should  be  effective  as  both  a  visual  aid  and  demonstration  tool. 
This  would  allow  the  simulator  to  be  effectively  incorporated  into  training 
segments  (e.g.,  theory  familiarization)  which  do  not  include  extensive  practi¬ 
cal  troubleshooting  experience.  Such  a  dual  purpose  simulator  would  be  almost 
essential  if  total  replacement  of  existing  equipment  is  planned.  The  potential 
impact  of  simulator  training  on  student  performance  may  best  be  acheived  if  a 
"utilization  strategy"  is  designed  to  accompany  the  placement  of  the  equipment 
into  an  existing  training  environment.  Designing  such  a  plan  to  highlight  the 
real  and  potential  uses  of  the  simulator  would  insure  that  its  unique  training 
capabilities  were  tapped,  and  that  benefits  such  as  improved  student  perform¬ 
ance,  consistent  training,  reduced  training  time  and  cost  savings  might  be 
more  readily  observed. 


The  generalizability  of  the  findings  presented  here  is,  of  course, 
limited.  While  every  effort  was  made  to  adapt  experimental  design  principles 
to  this  natural  setting,  it  was  not  possible  to  rely  on  many  of  the  premises 
of  basic  learning  theory.  Until  parameters  such  as  course  content,  training 
method,  and  duration  of  training,  all  known  to  affect  learning,  are  subject  to 
more  careful  control,  a  rigorous  cost  ef fectiveness  analysis  of  simulation 
training  is  not  possible.  To  answer  the  question,  "Do  simulators  provide  more 
cost  effective  training  than  actual  equipment  trainers?"  we  must  be  able  to 
maximize  the  capabilities  of  simulators.  Simply  stated,  operational  test  sta¬ 
tions  were  not  designed  for  training  purposes,  but  simulators  can  and  should 
be  designed  solely  for  that  purpose. 
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There  have  been  spent  several  efforts  in  showing  that  the  low 
validity  of  personality  inventories  can  be  raised. 

The  approaches  to  this  problem  have  been  inspired  by  one  of  the 
folio wig  assumptions  : 

(1)  The  validity  depends  or.  tne  trait  that  is  measured. 

(2)  The  validity  depends  on  the  criterion  that  is  used. 

(3)  The  validity  depends  on  the  individual  that  is  assessed. 

(4)  The  validity  depends  on  the  situation  in  which  the  inventory 

is  responded  to. 

With  regard  to  assumption  four  Claeys  et  al  (1981)  found  that  the 
validity  of  a  personality  inventory  w?.s  much  higher  when  preceded 
by  a  free  se  1  f  de  sc  ri  pt  i  or.  .  The  f  ree  -  per  son  al  i  t  y  description  used 
in  this  research  inspired  a  new  method  for  collecting  situation- 
related  individual  personality  data. 


PACT0R2  I  N_  TH£_VAi.  I  DITV_OF_  PERSONA  LIT  Y_  INVENTORIES 
W.  CLAEYS,  P.  DC  30ECK,  A.  BcHRER 


Mischel  (1968)  has  gathered  an  impressive  amount  of 
evidence  showing  that  personality  inventories  have  only  a 
low  validity  when  behavioral  measures  are  used  as  a  criterion. 
His  conclusion  is  that  the  typical  validity  coefficient  has  a 
value  between  .20  and  .30,  the  so-railed  "personality  coeffi¬ 
cient"  value. 

Apart  from  criticism  on  the  way  Mischel  arrives  at  his 
conclusion  (see  Block,  1977),  there  have  been  spent  several 
e'fcrts  iii  showing  that  the  low  validity  can  be  raised.  The 
approaches  to  this  problem  have  been  inspired  ty  one  of  the 


following  assumptions  : 

(1)  The  validity  depends  on  the 

(2)  The  validity  depends  on  the 

(3)  The  validity  depends  on  the 

(4)  The  validity  depends  on  the 
ventory  is  responded  to. 


trait  that  is  measured, 
criterion  that  is  used, 
individual  that  is  assessed, 
situation  in  which  the  in- 


Differentia]  validity  as  a  function  of  the  trait 

Me  G^wan  and  Cornly  (1976)  have  shown  that  Mischel's  con¬ 
clusion  may  not  be  generalized  to  all  of  the  personality 
trails.  Dvr.ami  ri;i,  for  example,  like  it  is  expressed  in  fy- 

sicui  a-.-  ■  ivitv.  o  ra*:  '  c  bo  stable  across  situations  so  that 

individual  •]  on  that  trait  may  be  measured  in  a 

valid  \<  i  v  .  Fr.'-a.  '  .i  :  finding  it  might  b.  concluded  that  there 

a  *■  if-.is*  p-'r.'  *  ra :  ‘  let:  ling  themselves  to  a  valid  measure¬ 
s’-’:. t  .  :  i  i  of  i  ♦  d  i  f  f  ■■rence  in  assess  i  bi  lify  ,  due 

i  ■  1  ■  i v  i  ■  r  *  J  I'xprf  unci:  intercerrelations  , 

1  ~i -ward  dnaiv  by  L.o.»  v  i  ••  g-'r  <1967). 

■  i  ■  !  .  r  : :  ;  1  !  ’  ••  fr-i  1  i.  ;  iu.d  -r.  rav  be  derived 

•  '  ■  w  •  -  A!  I  n  1  \  ';l,  )  r  ;  ,  i  t  !;  based  cn  the 

r  r  ■  <  1  ■  '  1  .  '  1  .  •  !  :  m.uv  *  :..jt  ••  a  c  h  individu- 

■  r  i  .■■■  .  ■.  !  v  '  -'wn  t  mi'  •  at.?  that  it  does 
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It>  conformity  with  that  theory,  Bern  and  Allen  have  found  that 
not  all  of  the  individuals  may  be  assessed  on  friedliness 
and  conscientiousness  in  a  valid  way.  The  validity  clearly  de  pe7 
on  the  c r o ss - si t ua t i ona 1  variablitv  of  the  individual  within 
the  trait  domain  under  consideration.  This  finding  has  been 
replicated  by  Kenrick  and  Stringfied  (1930).  In  the  same  arti¬ 
cle  it  is  shown  that  also  the  observability  of  a  trait  has 
implications  with  respect  to  the  validity,  at  least  when  ratings 
by  others  are  used  as  a  criterion.  Traits  with  high  observabi¬ 
lity  yield  clearly  higher  validities  than  traits  with  low  ob- 
servabi li ty . 

Differential  validity  as  a  function  of  the  criterion 

The  criteria  that  are  used  by  Mischel  (1968)  to  evaluate  the 
validity  of  personality  inventories  are  ratings  or  observations 
of  behavior  in  specific  situations.  One  might  argue  that  such  a 
criterion  is  too  specific  and  not  reliable  enough.  That  is  why 
Fishbein  and  Azjen  (1974),  Jaccard  (1974),  and  Epstein  (1980) 
have  used  combined  criteria,  i.e.  scores  summed  or  averaged 
over  several  situations,  several  times,  and/or  several  modes  of 
behavior  expression  in  the  same  trait  domain.  All  of  these 
authors  have  shown  that  the  validity  can  be  raised  substantially 
by  using  a  combined  criterion.  Even  validity  coefficients 
am.ounting  to  .80  are  reached. 

Differential  validity  as  a  function  of  the  individual 

If  it  is  true  that  individuals  differ  as  to  the  degree  of 
variability  across  situations,  then  it  might  be  expected  that 
traits  may  be  measured  in  a  valid  way  in  some  individuals  (low 
variability  individuals),  but  not  so  ir.  otbers(high  variability 
individuals).  A  personal  style  with  implications  to  behavioral 
variability  is  the  degree  of  self-monitoring.  A  high  self¬ 
monitoring  person  has  the  ability  and  the  habit  of  adapting 
h i mse 1 f /h er se 1 f  to  each  situation,  and  hence  to  behave  in  a  way 
that  is  most  appropriate  to  the  given  situation.  His/her  beha¬ 
vior  is  supposed  to  depend  more  on  the  situation  that,  or,  per 
sonality  traits.  This  hypothesis  has  been  confirmed  by  Snyder 
(1974),  and  by  Snyder  and  Monson  (1975). 
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The  logical  consequence  of  this  finding  is  that  inventories 
should  be  less  valid  for  self-monitoring  individuals.  That 
is  what  indeed  has  been  found  by  Snyder  and  Swann  (1976).  In  a 
study  of  Lippa  (1978),  however,  this  finding  is  only  replica¬ 
ted  with  respect  to  neuroticism,  but  not  to  extraversion. 

High  self-monitoring  subjects  are  able  to  monitor  their  behavior 
so  that  they  do  not  seem  neurotic  if  it  is  appropriate,  whereas 
low  self-monitoring  subjects  cannot  hide  a  possible  neuroticism, 
even  if  suchlike  behavior  is  really  inappropriate  in  the  situation 
in  question . 

Behavioral  variability  may  be  considered  as  an  expression 
of  the  flexibility  of  well-adapted  person  in  contrast  to  the 
rigidity  of  a  maladjusted  person.  This  consideration  makes 
sense  from  studies  of  Moos  (1968),  Kogan  and  Wallach  (1967),  and 
Snyder  and  Monson  (1975),  so  that  it  may  be  expected  that  person¬ 
ality  inventories  are  more  valid  in  abnormal  than  in  normal 
populations . 

Finally,  there  might  be  individual  differences  in  the  degree 
of  awareness  individuals  have  of  their  behavior.  Individuals 
who  are  aware  of  their  behavior  may  be  expected  to  give  more 
valid  responses  tc  a  personality  inventory.  Awareness  of  own 
behaviors  is  a  special  case  of  a  more  general  style  variable 
that  is  called  "self-focused  attention"  ("private  self-conscious¬ 
ness").  It  has  been  shown  by  Scheier,  Buss,  and  Buss  (1976)  that 
an  agressiveriess  inventory  is  clearly  more  valid  in  a  group  of 
strongly  self-focused  individuals  (r=  .66)  than  in  a  group  of 

only  weakly  self-focused  individuals  (r=  ,09). 

!.')  i  f  f  e  r  en  +  i  a  1  validl  ty  as  a  function  of  the  situation 

From  the  research  of  Duval  and  Wicklur.d  (1972),  it  may  be 
concluded  that  self-focused  attention  can  be  prompted  by  the 
situation,  for  example  by  putting  a  mirror  in  front  of  the  sub¬ 
ject  -;o  that  he/she  can  see  him/herself .  Other  methods  are  : 
playing  a  band-recorder  on  which  the  subject  hears  his/her  own 
voice,  and  urine  a  TV-camei  It  turned  out  that  the  self- 

re  port  validity  in  raised  under  conditions  favoralble  to  self- 
fncu  at  "enfion  (Pryor,  Gibbons,  Wicklund,  and  Fodd,  1977  ). 


An  other  situation  that  has  been  shown  to  favor  inventory 
validity  is  the  preceding  of  a  f ree-response  method  of  self- 
report  (Claeys,  De  Boeck  ,  and  Van  Den  Bosch,  1981).  It  might 
be  interesting  to  extend  somewhat  on  this  research  because  the 
results  are  clear  but  not  yet  published. 

A  first  study  was  set  up  to  compare  the  validity  of  a 
free-response  method,  and  a  structured  personality  inventory. 

The  inventory  (Five  Personality  Factors  Test,  5PFT;  Elshout 
and  Akkerman,  1975)  consists  of  five  factorial  scales  : 
Extraversion,  Friendliness,  Conscientiousness,  Neurotiscism, 
and  General  Culture.  The  inventory  is  based  on  the  personality 
sphere  described  by  Catell  (1957).  In  the  free-response  method, 
the  subjects  were  required  to  give  10  adjectives  that  are  des¬ 
criptive  of  their  personality.  After  on,  all  of  the  adjectives 
were  judged  on  each  of  the  five  traits  of  the  inventory  by  a 
group  of  10  experts,  so  that  each  adjective  could  be  given 
a  value  on  each  trait.  The  averaged  value  of  the  adjectives  from 
a  subject  on  a  trait  served  as  a  score  of  that  subject  on  the 
trait  in  question.  Two  orders  of  presentation  were  used  for 
the  self-report  :  inventory /'free-response  method,  and  free- 
response  method/inventory ■  The  criteria  to  assess  the 
validity  consisted  of  behavioral  ratings  with  respect  to  the 
traits  of  the  inventory.  Each  subject  was  rated  by  his  father, 
his  mother  and  a  friend.  Subjects  were  84  "'ale  students  from 
the  final  year  in  a  high  school  in  a  Dutch  speaking  t own  in 
Belgium . 

The  validity  of  the  two  self-report  methods,  the  inventory 
and  the  free-response  method,  was  about  equal,  but  very  low 
(r=  .20),  in  conformity  with  the  personality  coefficient  cited 

by  Mischel  (1968).  A  more  interesting  finding,  however,  was 
that  the  validity  of  the  inventory  was  much  higher  (about  .50 
on  the  average  (1))  when  the  f ree- de sc r ipt i on  method  preceded 


(1)  The  validity  coefficients  that  are  mentioned  concern  a  com¬ 
bined  criterion,  i.e.  the  summed  ratings  from  father,  mother  and 
a  friend.  However,  the  task  order  effect  on  validity  did  not 
depend  on  the  rater.  Similar  effects  were  found  for  eacn  of  the 
ra t or s  . 
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the  inventory,  then  when  the  reverse  order  was  used,  in  which 
case  the  validity  is  virtually  non-existent.  The  same  effect 
shows  up  with  respect  to  the  validity  of  the  free- response 
method,  i.e.  its  validity  is  much  higher  when  it  comes  first 
than  when  it  is  preceded  by  the  inventory. 

To  test  the  replicability  of  these  findings,  a  second  study 
was  set  up  with  essentially  the  same  procedure  and  with  a  sampli 
from  the  same  population.  Again  the  validity  difference  was 
found  as  a  function  of  the  task  order.  The  validity  amounts  to 
.43  when  the  subject  starts  with  the  free-response  method,  where 
as  a  value  of  only  .15  is  reached  when  the  inventory  precedes. 

A  possible  reason  for  this  effect  is  that  the  free-reponse 
method  stimulates  self-focused  attention  much  more  than  the 
personality  inventory. 

Discussion  and  Conclusion 

In  general,  there  seems  to  exist  means  to  raise  the  validity 
of  personality  inventories.  The  first  solution  implies  that  only 
some  traits  may  be  measured.  Therefore,  this  solution  is  only 
satisfying  if  one  is  interested  in  one  of  those  exceptional 
traits.  The  second  solution  asks  for  combined  criteria,  whereas 
in  practice  one  is  mostly  interested  in  simple  criteria.  The 
third  solution  restricts  the  use  of  inventories  to  subgroups 
of  subjects,  so  that  also  that  solution  is  rather  unsatisfying. 
The  fourth  solution,  namely  providing  validity  favoring  situa¬ 
tions,  is  more  hopeful,  as  it  might  be  general  in  its  applica¬ 
bility. 

One  may  wonder,  however,  whether  a  concentration  on  the  use 
of  inventories  is  still  appropriate  in  the  light  of  a  view  on 
personality  that  has  become  less  trait -based,  less  nomothetic, 
and  more  oriented  to  the  individual  person  and  his  interaction 
with  the  situation,  and  to  the  processes  involved  in  that  inter¬ 
action  (Lamiell,  1981;  Magnussor  and  Enter,  1977;  Mischel  1973). 

Such  a  method  was  developed  by  L.Pervin  (1976,1978).  Pervin 
aks  the  person  to  tell  a  number  of  rather  important  situations 
or  events  he  experienced  during  the  past  year.  After  that,  the 


person  has  to  say  how  he  percieved  each  situation  or  event, 
what  his  feelings  were  and.  how  he  did  behave.  The  obtained  in¬ 
formation  is  put  in  tv*„  matrices.  The  first  one  contains  situ¬ 
ations  x  situational  characteristics  (feelings,  perceptions), 
the  second  one  situations  x  behavioral  characteristics.  The 
task  of  the  subject  is  now  to  indicate  to  what  degree  each 
characteristic  applies  to  each  situation.  The  obtained  data 
can  be  analysed  by  a  number  of  different  methods. 

Inspired  by  the  Pervir.-method  and  by  the  f  ree -r e spon  se  task 
used  by  Claeys,  Bohrer  tries  to  develop  an  individual  persona¬ 
lity  assessment  technique.  The  subjects  are  required  to  give  10 
adjectives  that  are  descriptive  of  their  personality.  After 
that,  they  have  to  indicate  in  which  kind  of  situation  and/or 
in  relation  with  which  person  each  given  adjective  is  c 1 early 
appropriate  and  to  illustrate  this  by  a  concrete  event.  Finaly 
it  is  asked  how  they  feel  about  being  (or  reacting)  that  way. 

We  hope,  by  this  technique,  not  only  to  improve  the  validity 
of  our  measurements  but  also  to  render  the  selection  procedure 
more  acceptable  for  all  subjects. 
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v'*  A  comprehensive  job  analysis  was  conducted  as  the  first  stage 
in  the  development  of  a  new  examination  for  13  occupations  currently 
using  the  Junior  Federal  Assistant  examination.  These  are  technical  oc¬ 
cupations  with  the  entry  level  at  the  CS-4  grade  level.  The  need  to  survey 
a  number  of  occupations  at  the  same  time  creates  potential  advantages  and 
disadvantages  for  the  conduct  of  job  analyses.  Some  of  the  critical  issues 
dealt  with  in  the  course  of  the  current  project  involved  the  level  of  gener¬ 
ality  of  the  individual  task  statements , incorporation  of  tasks  from  dif¬ 
ferent  occupations  in  a  single  inventory,  and  procedures  for  determining 
commonality  between  the  different  occupations.  Approximately  3000  employees 
were  selected  to  complete  the  inventory,  using  a  sampling  procedure  which 
took  into  account  the  relative  proportions  of  workers  in  the  13  occupations 
and  the  agencies  and  locations  where  they  worked.  * 
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Although  job  analyses  are  playing  an  increasingly  inportant  role 
in  personnel  psychology,  it  still  remains  one  of  the  more  onerous  and  ex¬ 
pensive  activities  in  the  field.  In  part,  the  expense  of  job  analysis 
has  been  due  to  the  requirements  of  the  Uniform  Guidelines,  which  set 
detailed  standards  for  the  job  relatedness  of  various  psychological  in¬ 
struments  such  as  examinations,  performance  appraisal  systems,  train¬ 
ing  programs  etc.  In  addition,  there  has  been  increased  concern  about 
the  validation  of  such  instruments  against  criteria  which  are  reflec¬ 
tive  of  actual  job  performance.  This  has  also  led  to  more  elaborate 
job  analysis  techniques.  Thus,  there  is  an  incentive  to  develop  pro¬ 
cedures  which  can  minimize  inefficiency  while  still  retaining  neces¬ 
sary  methodological  and  legal  characteristics. 

The  Junior  Federal  Assistant  Examination  Development  Project, 
under  discussion  here,  involves  a  number  of  factors  with  a  bearing 
on  the  issue  of  the  cost-effective  conduct  of  the  job  analysis.  These 
factors  are:  1)  the  purpose  for  which  the  job  analysis  results  will 
be  used, 2)  the  level  of  specificity  of  the  task  statements,  and  3)  the 
number  of  occupations  to  be  studied.  As  we  shall  see,  the  first  two 
points  should  be  considered  to  be  interrelated.  The  JFA  project  is 
designed  to  produce  one  or  more  selection  devices  that  can  replace 
or  supplement  the  existing  JFA  examination.  Thus,  the  present  paper 
particularly  applies  to  job  analyses  conducted  as  the  initial  step 
in  examination  development  more  than  for  training  and  classification. 

Actually,  there  has  been  a  general  distinction  between  job 
analyses  to  be  used  for  training/classification  and  those  used  as  the 
basis  for  developing  selection  instruments  (e.g.  McCormick, 1976, 1 979) . 

The  differences  between  these  two  approaches  have  generally  been  more 
in  terms  of  the  questions  asked  about  the  tasks  than  about  differences 
in  the  level  of  specificity  of  the  task  statements.  For  example, 
a  job  analysis  conducted  for  training  programs  might  ask  the  raters 
to  rate  each  task  in  terms  of  the  time  to  train,  the  feasibility  of  its 
being  trained  on-the-job,  the  extent  of  training  in  school  etc.  A  job 
analysis  intended  for  selection  purposes,  on  the  other  hand,  might 
only  ask  for  the  criticality  and  time  spent  performing  the  task.  How¬ 
ever,  the  task  statements  themselves  night  well  be  identical  in  both 
cases . 

Fruchter,  Marin,  and  Archer  (1963)  in  fact  state  as  a  theore¬ 
tical  ideal  that  all  task  statements  should  be  equal  in  specificity, 
although  acknowledging  that  practical  considerations  might  weigh  In 
favor  of  variable  levels  of  specificity.  In  line  with  this  view  is  the 
suggestion  by  Melching  and  Borcher( 1973)  that  occupational  inventories 
should  include  at  least  200  but  no  more  than  600  tasks.  They  argue  that 
to  include  fewer  than  200  tasks  in  an  occupational  area  would  result  in 
task  statements  so  general  that  they  would  yield  little  specific  in¬ 
formation  about  jobs. 

Recent  research  is  changing  this  picture,  however,  and  indi¬ 
cating  that  there  need  be  less  concern  about  highly  specific  and  de¬ 
tailed  task  lists  in  examination  development.  Schmidt, Hunter,  and  Pearl- 
nan  (i981)  showed  in  two  studies  comprising  a  massive  sample  size  of 


nearly  400,000  subjects  that  the  moderating  effects  of  task  differences 
had  only  a  small  impact  on  test  validities.  This  proved  true  even  when 
jobs  were  vastly  different  in  their  task  composition.  The  validity  of 
different  test  types  such  as  verbal  ability, perceptual  speed, quanti¬ 
tative  ability  etc.  were  assessed  across  five  distinct  clerical  families 
of  jobs (stenography-typing, computing-accounting, production  and  stock, 
information  and  message  distribution,  and  public  contact  and  investi¬ 
gations).  A  comparison  of  the  mean  observed  validity  across  these  cler¬ 
ical  job  families  with  the  pooled  validity  showed  that  there  was  a 
high  degree  of  similarity  between  the  test  validities  in  different  job 
families.  In  addition,  the  standard  deviation  of  validities  within  fa¬ 
milies  was  not  significantly  smaller  than  the  pooled  standard  deviation 
across  job  families.  These  results  held  true  for  validities  using  either 
proficiency  or  training  success  criteria. 

In  a  second  study,  differences  in  validity  were  considered  across 
Army  occupations  differing  very  substantially  in  their  task  composi¬ 
tion  (e.g.  radio  repair .welder , cook, dental  assistant).  The  same  kind 
of  comparisons  between  mean  and  pooled  validities  were  conducted  for 
the  basic  test  types  used  in  the  first  study.  A  high  correlation  was 
found  between  the  validity  estimates  of  the  first  study  with  those  ob¬ 
tained  in  the  second.  The  true  standard  deviation  for  the  test  type  va¬ 
lidities  was  found  to  be  .1081.  As  Schmidt  et  al.  note,  for  SDs  of  this 
size, very  large  sample  sizes  are  needed  to  detect  the  moderating  influ¬ 
ence  of  different  jobs,  even  when  the  difference  is  one  or  two  SDs. 

Since  the  average  validity  of  these  different  kinds  of  aptitude  tests 
was  .45,  90Z  of  the  validities  across  different  jobs  have  values  be¬ 
tween  .27  and  .63.  Even  at  the  lower  end  this  is  a  respectable  and 
useful  validity.  In  addition,  it  should  be  remembered  tliat  the  varia¬ 
tion  among  jobs  is  about  as  extreme  as  will  ever  be  found  in  normal 
organizational  situations. 

Other  studies  have  also  reported  substantially  similar  vali¬ 
dities  across  occupations  (e.g  Ghiselli, 1966;  Maier  6  Fuchs, 1969). 

For  example,  Ghiselli  found  that  clustering  jobs  according  to  simila¬ 
rities  of  average  validity  patterns  of  aptitude  tests  resulted  in  the 
grouping  of  jobs  which  had  little  apparent  similarity  in  the  nature  of 
their  respective  tasks.  He  concluded  that  jobs  appearing  to  be  dif¬ 
ferent  in  terms  of  the  nature  of  the  work  may  require  similar  job-re¬ 
lated  abilities  (cf.  Mobley  4  Ramsay, 1973;  Randhawa,1978). 

Development  of  the  Inventory 

The  occupations  included  in  the  present  job  analysis  were  those 
series  identified  as  primary  users  of  the  current  JFA  exam  for  selec¬ 
tion.  The  exam  is  being  used  for  entry  into  a  number  of  technical  occu¬ 
pations  in  the  Federal  government  at  the  GS-4  level.  Although  certain 
other  occupations  make  selections  off  the  JFA  register,  this  is  done 
out  of  convenience  rather  thas  as  a  result  of  occupational  standards. 

The  13  occupations  covered  in  the  present  study  are  listed  in  Table  1. 

It  is  obvious  thet  the  occupations  using  the  current  JFA  exam 
are  a  diverse  group.  The  central  question  of  the  job  analysis  is  the 
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(iejTfe  of  tindr  rl  y  j  rif  sirilnriiy  (  i  :  .  i  '  in  t '  t  t  t  j  t  >  t  :  I  i  esc  i‘i  f  fcrrnt 
series*  In  addition  to  tie  valicin  .  i  rural  I?.,  t  i..t  i  rsne  r.ised  frevi- 
ouslv ,  there  are  two  other  issues  ti  at  f-lonit;  hi-  i  -  •  ?  jeered  •  hirst,  al¬ 
though  two  series  might  he  deal  ins-  i-ith  different  subject  patter,  the 
tasks  assigned  to  tie  technicians  rid*  ir.  hdt  c*  sentiWly  tie  ssre 
activities.  Secondly.it  is  ertrv-i.  vil  fp  hicr;  •  i-at  r.rt  going  to 
be  filled  through  whatever  selection  devices  .  i  t  .  .  ve 1 oped ,  Vlile  sj  e- 
cialized  knowledge  rav  be  irportant  at  bit  i  er  .  r.oe  levels,  sue!  know¬ 
ledge  is  not  reouired  at  the  entry-level. 

As  noted,  one  of  the  raj or  concerns  of  this  paper  is  the  ques¬ 
tion  of  efficiency  in  the  conduct  of  iob  analyses.  ?iis  is  particular¬ 
ly  important  w*ben  a  number  of  different  occupations  are  being  sur¬ 
veyed  at  tbe  same  tine.  At  the  outset,  a  number  of  basic  derisions  were 
made  on  tbe  development  of  an  appropriate  task  inventory  for  tbe  13  oc¬ 
cupations  based  on  the  research  literature  and  the  purpose  of  the  study. 
First  of  all,  given  tbe  accumulating  evidence  on  tbe  small  impact  of 
task  differences  on  test  validities,  it  was  deemed  unnecessary  to  de¬ 
velop  long  lists  of  nolecular  tasks  for  each  of  the  13  occupations.  In¬ 
stead,  it  was  felt  that  tbe  level  of  specificity  of  the  statements 
should  be  somewhere  between  an  overall  dot'  and  a  particular  task.  Se¬ 
condly,  since  the  purpose  of  the  analysis  was  to  determine  occupational 
commonalities.  It  made  sense  to  use  a  single  inventory  containing  the 
duty/task  statements  for  all  the  occupations  for  three  reasons:  1)  the 
reduction  in  tbe  number  of  task  statements  for  each  occupation  made 
it  feasible  to  incle.de  them  all  in  tie  same  booklet,  2)  all  the  respon¬ 
dents  could  he  exposed  to  the  sare  set  of  tasks  to  rate,  thereby  ra¬ 
king  the  measurement  of  response  commonality  possi hie .and  3)  in 
some  locations, di fferent  occupations  can  be  found  together  and 
surveying  bercotres  easier  when  the  inventory  can  be  used  by  all 
occupations. 

The  development  of  initial  lists  of  tasks  was  started  in  the 
usual  way  by  examining  descriptions  and  standards  for  the  occupations. 
0>ce  an  initial  list  of  tasks  for  each  occupa r i on  was  extracted  from 
these  materials,  they  were  put  into  a  form  which  would  meet  the  require¬ 
ments  concerning  comprehensibility,  observability  etc,,  mentioned  in 
the  *niform  Guidelines,  besides  tlesi?  more  or  less  technical  revisions, 
the  general  question  of  task  specificity  was  examined  while  rewriting 
tie  dutv/task  statements. 

Tbe  rain  concern  was  to  lave  tie  dutv/task  statements  preserve 
f urdamerta J  ard  real  diflt-r,  rues  ir  M  r  nature  of  tasks  without  reflect- 
inp  riruto  or  terminological  d  i  i  u  r* nc< s .  *ri?yir>-  factors  existed  since 
the  overall  grade  levels  order  cor  s  ider.if  ion  for  tie  occuj  a  t  ions  ate 
ecu i valent  and  thus  fie  level  of  difficulty  s!i->ld  he  as  well.  Vi  ti¬ 
res  pert  to  task  di  f  1 1  recces  ,  t-f-  <tci  j  at  inns  ray  lave  tasls  involving, 
the  filing  of  different  forms;  however , tl  e  content  end  level  of  diffi¬ 
culty  could  he  cssen  t  ini  1  v  ill  ’*  v..s  dtcidid  to  tr\  to  rewite 
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duty/ task.  free  tie  rough  fom  of  the  inventory  was  determined,  panels 
of  S’Ts  fror  earn  occupation  evaluated  the  task  lists  for  their  jobs. 
After  tie  review  panel  for  each  occupation  had  pone  over  the  invento¬ 
ries  and  apreed  t^at  the  lists  were  accurate  and  comprehensive, the  in¬ 
ventories  were  changed  to  reflect  their  consents. 

The  final  inventory  was  composed  of  a  background  section 
containing  a  nunher  of  biographical  questions  to  ensure  that  a  repre¬ 
sentative  sarple  had  been  obtained.  Following  this  was  the  list  of  222 
tasks  for  all  13  occupations.  The  tasks  were  arranged  in  alphabetical 
order  rather  than  hv  occupation  to  ensure  that  the  respondents 
would  look  at  all  the  task  statements.  Fach  duty/task  was  to  be  rated 
on  performance, t lne  spent,  and  importance  scales. 

Administration  of  the  JFA  Inventory 

In  order  to  ensure  that  the  results  of  the  job  analysis  were 
representative  of  the  work  force,  a  sampling  plan  was  developed.  Es¬ 
sentially,  a  probability  sample  was  created  using  occupation .agency, 
and  location  as  the  controlling  factors.  F.ach  occupation  has  a  differ¬ 
ent  population  size  than  the  others  and  the  total  range  is  quite  large 
(M  employees  in  series  992  compared  with  25,960  in  series  2005). 

Due  to  the  wide  variation  in  occupational  size,  it  was  impossible  to 
maintain  the  exact  proportions  in  the  sample,  although  an  effort  was  ttade 
to  approximate  the  correct  proportions. 

Other  important  considerations  that  were  desirable  to  take  into 
account  were  the  age .ethnic  composition .grade  level  and  sex  of  the  re¬ 
spondents  in  the  sample.  These  last  variables  were  more  subject  to 
the  individual  agencies  than  to  GPM. 

Approximately  3000  employees  were  sampled  in  the  13  occupations 
being  analyzed.  The  respondents  worked  in  various  locations  around  the 
continental  *nited  States.  Only  those  employees  *n  grades  4-7  were 
sampled  since  the  Guidelines  indicate  that  only  those  tasks  likely 
to  he  asked  of  the  average  employee  within  5  years  of  entry  into  the 
occupation  should  he  considered  in  entry  level  exam  development, 

Fesul t s  and  Pi scuss ion 

Of  the  29c7  inventories  that  were  completed,  21)  had  to  he  dis¬ 
carded  because  they  were  too  incomplete  to  process  (e.g.  the  impor¬ 
tance  scale  was  not  filled  out),  respondents  were  not  In  prades  4-7, 
or  hey  Mographical  data  was  niscoded.  Fince  the  original  probability 
sarple  had  specified  about  2PCP  respondents  some  deletion  of  inventor¬ 
ies  was  necessary.  This  was  particularly  the  case  since  the  overrerre- 
sentntjor.  was  not  consistent  from  occupation  to  occupation.  A  computer 
program  was  used  to  randomly  select  the  correct  proportions  from  each 
f •■ec?.  t  ior  . 

•sing  the  biographical  data,  it  was  ressible  to  ascertain,  that 
t 1  ;•  samjle  rate  led  lie  characteristics  of  tie  overall  population  on 
almost  all  of  the  relevant  variables,  including  minority  representation. 
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Since, as  expected,  there  was  a  high  degree  of  correlation  be¬ 
tween  the  ratings  on  tine  cpent  and  importance  on  the  duty/tasks , it 
was  only  necessary  to  analyze  the  data  on  the  latter  scale.  The  duty/ 
task  ratings  on  iaportance  were  first  analyzed  using  the  method  advo¬ 
cated  bv  Christal( 1 974).  Each  task  rating  was  converted  to  a  percen¬ 
tage  of  all  the  ratings  from  each  respondent.  Then  these  proportions 
were  averaged  over  all  raters  and  not  just  those  performing  the  task. 

The  importance  ratings  for  all  the  dutv/tasks  were  analyzed  at 
both  the  occupational  and  combined  level.  A  rational  criterion  was  set 
in  order  to  identify  the  most  important  tasks.  The  criterion  was  sug¬ 
gested  by  the  form  of  the  data:  there  seemed  to  be  a  natural  dichoto¬ 
my  between  the  top  10  or  152  of  the  rat'd  dutv/tasks  and  the  remainder, 
at  both  levels  of  analysis.  In  general,  dutv/tasks  rated  by  at  least 
23-30*  of  t he  respondents  had  a  good  chance  of  being  included  in  the 
top  group. 

At  the  overall  level, 32  of  the  222  duty/tasks  exceeded  the 
criterion  mean  .  Approximately  the  sane  number  of  duty/tasks  exceeded 
the  criterion  at  the  occupational  level  as  well.  Although  each  oc¬ 
cupation  had  critical  tasks  idiosyncratic  to  the  occupation,  there 
was  a  great  deal  of  overlap  between  the  critical  tasks  at  the  occu¬ 
pational  level  and  at  the  combined  level.  Table  2  lists  the  percentage 
overlap  between  the  critical  duty/tasks  of  each  occupation  with  the 
critical  duty/tasks  identified  at  the  combined  level.  As  can  be  seen, 
all  but  one  of  the  occupations  has  an  overlap  of  approximately  502 
or  more  with  the  overall  critical  duty/tasks.  The  exception  is  the 
1105  series  with  a  352  overlap.  At  the  upper  end  we  have  the  1107 
series  with  a  902  overlap. 

In  order  to  obtain  a  more  precise  view  of  the  interrelation¬ 
ship  among  the  13  occupations,  cluster  analyses  were  performed  on  the 
importance  ratings  from  the  various  occupations.  A  method  proposed  by 
Johnson  (1967)  was  used  because  of  its  compatibility  with  data  pos¬ 
sessing  only  known  rank-order  properties.  The  algorithm  contains  the 
following  properties:  1)  input  consists  solely  of  n(n-l)/2  similarity 
measures  among  the  _n  objects  under  study  and  2)  the  clustering  is  es¬ 
sentially  invariant  under  monotone  transf ormations  of  the  input  simi¬ 
larity  data. 

The  similarity  nat”ix  was  constructed  using  the  measure  des¬ 
cribed  by  Archer  (1966).  The  overlap  between  any  pair  of  £  variates 
in  the  form  of  relative  frequencies  is  found  by  selecting  the  mini¬ 
mum  proportion  from  each  pair  and  then  forming  the  sun  of  such  pairs. 

A  sum  or  0  indicates  no  correspondence  while  a  sun  of  1  indicates  a 
uerf  ct  correspondence. 

Johnson  specifies  two  possible  approaches  to  the  identifica¬ 
tion  of  cluster  or  similarity  solutions,  namely,  the  minimum  and  max¬ 
imum  methods.  In  the  maximum  method,  for  each  clustering  the  diameter 
of  the  cluster  is  computed.  The  value  of  the  clustering  is  the  maxi¬ 
mum  diameter  of  the  various  clusters.  The  nininun  method  follows  the 
same  initial  steps  as  the  maxinun  procedure;  however,  the  distance 
between  any  sequence  of  objects  is  determined.  The  procedure  can  i den- 


jo-. 


tify  long  serpentine-like  shapes.  The  total  size  of  a  chain  is  the  lar¬ 
gest  link  distance,  while  the  chain  distance  between  any  two  clusters  is 
the  minimal  chain  size  of  all  chains  between  the  two. 

In  the  present  study  both  procedures  were  used  although  it  was 
felt  that  the  the  minimum  or  connectedness  method  would  probably  yield 
the  most  interpretable  results.  This  is  because  of  the  inherent  nature 
of  the  task  data  across  occupations;  each  occupation  will  have  a  num¬ 
ber  of  idiosyncratic  tasks  that  will  make  compact  single  clusters  more 
unlikely  but  linkages  more  likely.  A  similarity  matrix  was  computed 
for  the  importance  data  from  the  13  occupations.  This  similarity  matrix 
was  then  analyzed  according  to  the  minimum  and  maximum  methods  with 
the  results  appearing  in  Tables  3  and  4  respectively.  The  graphical 
display  is  generally  in  the  form  of  a  rd?ht  triangle  where  objects  are 
merged  one  by  one  until  all  are  Included.  As  expected  the  minimum  method 
produced  somewhat  more  coherent  results  although  there  was  significant 
inter-agreement.  In  Table  3  it  can  be  seen  that  there  are  two  main 
clusters  of  occupations;  the  first  comprises  series  962,992,1105,1107, 
1411, and  1421,  while  the  second  comprises  series  963,986,990,1702,  and 
2005.  Even  these  two  clusters  quickly  become  merged  at  a  relatively  high 
level(.599).  The  344  and  1106  series  on  the  other  hand  seem  to  be  ano¬ 
malous  in  terms  of  their  response  pattern  since  the  incusion  of  these 
two  series  substantially  decreases  the  index  of  similarity  (.250). 

In  Table  4  we  see  the  results  from  the  maximum  or  diameter  me¬ 
thod.  Again,  series  344  and  1106  appear  to  be  the  most  resistant  to 
clustering.  The  two  main  clusters  of  the  connectedness  method  are  divi¬ 
ded  into  four, possibly  for  the  reasons  discussed  above.  The 
various  clusters  do  merge,  but  at  a  lower  level  than  in  the  connected¬ 
ness  method  (.20).  It  should  be  noted  In  connection  with  the  two 
anomalous  series  that  there  is  some  indication  that  duties  assigned 
by  the  employing  agencies  have  a  significant  amount  of  heterogeneity 
in  these  series.  This  may  account  for  their  position  as  outliers. 

The  results  indicate  that  at  least  11  of  the  13  occupations 
possess  substantial  commonality  among  themselves,  based  on  the  over¬ 
all  pattern  of  importance  ratings  on  the  duty/tasks.  Since  even  with 
the  two  outlying  series  there  was  substantial  overlap  on  the  most  cri¬ 
tical  tasks,  the  commonality  nay  even  be  stronger  than  suggested  by 
the  cluster  analysis,  at  least  as  far  as  the  Guidelines  are  concerned. 
The  Guidelines  indicate  that  different  jobs  can  be  grouped  together 
for  validity  studies  when  they  have  substantially  the  same  major  work 
behaviors(  U.S.  EEOC  et  al.  p. 38300, section  14b)  . 

Similar  findings  for  clerical  occupations  have  been  reported 
by  Abbe(1980).  A  single  large  cluster  of  clerical  occupations  were 
established  using  a  variety  of  techniques,  including  the  present 
approach, factor  analysis,  and  principal  component  analysis.  Thus,  there 
is  good  reason  to  believe  that  more  cost-effective  methods  of  job  ana¬ 
lysis  can  be  employed  in  examination  development  consistent  with  legal 
and  methodological  requirements. 
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1.  OS— 344 

2.  OS-S69 

3.  OS-963 

4.  rs-°86 

5.  OS-990 

6.  OS-992 

7.  OS-1105 

8.  OS-1 1  ('6 

9.  OS-1 107 
10. OS-1411 
11. OS-1421 
12. OS-1702 
13. OS-2005 


lanaperont  Assistant 

Contact  Representative 

lepal  Instruments  Kxariner 

lepal  Technician 

Oeneral  Olaims  Ixaniri’t 

Loss  anti  Panape  01aims  Examiner 

Purchasing  Apcnt 

Procurement  Technician 

Property  Disposal  ^ccir-iciar 

library  Technician 

Archives  Technician 

Education  and  Training  Technician 

Supply  Technician 


Table  2 

Proportion  of  Overlap  lotween  the  Post  Important 
Tasks  at  the  Oor.ibined  level  with  the  Occupational  level 


Series  Proport Ion  of  Overlap 
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1  1 07 
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.50 
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.81 

2005 

.81 
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Cluster  analysis  using  the  connectedness  method 
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Cluster  analysis  using  the  diameter  method 
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COMPUTERIZED  ADAPTIVE  TESTING:  FROM  PSYCHOMETRICS  TO  SYSTEMS 


Paul  R.  Croll 

Director,  CAT  System  Development  Project 
U.S.  Office  of  Personnel  Management 


The  design  of  a  Computerized  Adaptive  Testing  (CAT)  system 
requires  the  careful  integration  of  psychometric  and  engi¬ 
neering  developments.  System  designers  must  be  cognizant 
of  the  relationships  among  the  psychometric  components  of 
the  system,  among  the  physical  components  of  the  system, 
and  between  its  psychometric  and  physical  components.  A 
hierarchical  functional  design  model  has  been  developed  to 
facilitate  the  design  of  a  prototype  CAT  system  for  DoD  en¬ 
listed  personnel  accessions  testing.  The  model  addresses 
both  the  psychometric  and  administrative/operational  re¬ 
quirements  of  the  system  and  serves  as  the  blueprint  for 
translating  these  functional  requirements  into  a  working 
prototype.  The  focus  on  function  in  this  design  approach 
insures  that  the  description  of  CAT  psychometric  proce¬ 
dures,  in  terms  of  system  functions,  precedes  specifica¬ 
tion  of  the  physical  elements  of  the  system  through  which 
these  procedures  are  implemented.  Such  precedence  is  cri¬ 
tical  to  the  psychometric  integrity  of  the  system  develop¬ 
ment  effort. 


Computerized  adaptive  testing  (CAT)  is  a  remarkably  effective  combination 
of  recent  developments  in  latent  trait  theory  and  continuing  advances  in  com¬ 
puter  technology  (Urry,  1977a).  Unlike  conventional  paper-and-pencil  group 
testing,  in  which  identical  test  forms  are  administered  simultaneously  to 
large  groups  of  examinees,  computerized  adaptive  testing  is  an  Individualized 
testing  procedure  that  constructs,  administers,  and  scores  tests  interactively 
during  the  actual  testing  session.  Each  examinee  receives  only  those  questions 
appropriate  to  his  or  her  own  level  of  ability,  resulting  in  an  individualized 
test  "adapted"  or  "tailored"  to  the  specific  examinee's  level.  The  number  of 
questions  required  to  produce  an  estimate  of  ability  at  the  same  level  of  re¬ 
liability  as  the  longer  group  test  is  considerably  less.  In  1979,  the  Depart¬ 
ment  of  Defense  established  a  joint-service  project,  led  by  the  Navy  Personnel 
Research  and  Development  Center,  to  evaluate  the  feasibility  of  CAT  for  en¬ 
listed  personnel  accession  testing.  The  project  has  been  conceived  as  a  large- 
scale  system  development  effort,  integrating  psychometric  and  engineering  de¬ 
velopments  to  meet  system  goals. 

CAT  System  Design  Principles 

The  primary  objectives  of  the  CAT  system  development  effort  are  the  de¬ 
sign,  development,  test  and  evaluation  of  a  system  for  automated  adaptive  ad¬ 
ministration  of  the  Department  of  Defense's  enlisted  personnel  selection  and 
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classification  tests.  The  desired  on  tv;:;:e  o."  tin  :  •  ..a  :  i  <  :  tort  is  un 

integrated  set  of  well-defined  inputs  ,  proc---  .•■.-?> ,  -  . .  ..  ts  c.-. .  f  i:.,.  toe 

following  criteria:  (1)  user  (i.<_.  servici  anew-  <  i  iy  :  i:-,<  ted 

into  specif  icrt  ions  that  both  define  s„.  st«.-i  ;•  r  :-dtu  :  •  •  -do  :  r  control 

of  systea  processes;  (2)  system  products  complete  1>  -nst ;  :  cu:  :  >•  con  form 

to  user  specifications;  and  (3)  system  pro.  -.sses  a:;.i  :  ,cls  arc  continually 

monitored  to  insure  r-uch  confortaar.ee.  The  capabiliL;.  •  ,.i  delivery  oi  well- 
defined  products,  meeting  user  needs  and  no-tit. >rec  ■  forma ace  with  user 

specifications,  is  the  essence  of  a  OT  ^  /s :  c-n. 

The  system  development  problem  is  approached  through  iw  disLinct  lines 
of  development:  psychometric  development  of  ttie  procedures  lor  adaptive 
testing  and  engineering  development  of  the  physical  system  through  which 
these  procedures  will  be  implemented.  The  application  of  system  design  prin¬ 
ciples  to  the  development  of  the  computer-based  physical  system  is  straight¬ 
forward  and  well  supported  by  present  practice.  The  application  of  such 
principles  to  the  development  of  psychometric  procedures,  however,  is  unique 
and  can  present  a  subtle  danger  to  the  integrity  of  the  system  as  a  whole. 

The  danger  lies  in  the  possible  failure  to  recognize  that  the  CAT  system 
must  be  designed  to  meet  psychometric  objectives  first.  Engineering  object¬ 
ives  must  not  be  permitted  to  drive  the  system  development  effort,  ter  ex¬ 
ample,  modification  of  well-proven  CAT  algorithms,  to  fit  an  initial  concep¬ 
tion  of  hardware  performance  characteristics,  is  inappropriate.  Rather,  the 
algorithms  chosen  should  dictate  hardware  specifications.  Viewing  CAT  sys¬ 
tem  development  as  simply  another  data  processing  system  development  exercise 
is  likely  to  compromise  the  system’s  psychometric  integrity.  Recognition  of 
the  tremendously  complex  network  of  interactions  underlying  systems  design 
is  especially  nacessary  in  CAT  system  development.  System  designers  must  bo 
cognizant  of  the  relationships  among  the  psychometric  components  of  the  sys¬ 
tem,  among  the  physical  components  of  the  system,  and  between  its  ps; chorae- 
tric  and  physical  components.  Appreciation  of  these  relationships  is  criti¬ 
cal  in  integrating  these  components  into  a  well-functioning  system. 

In  order  to  facilitate  such  integration,  the  design  strategy  chosen  for 
the  CAT  system  has  focused  on  function  rather  than  structure.  A  hierarchical 
functional  design  model  has  been  developed  addressing  both  the  psychometric 
and  administ rative/operatior.al  requirements  of  the  system.  This  model  con¬ 
sists  of  a  set  of  hierarchical  functional  descriptions  of  system  components 
and  their  interrelationships.  These  detailed  descriptions  serve  as  the  basis 
for  design  of  the  system  structure,  system  prototyping,  and  final  system  de¬ 
velopment.  A  useful  technique  for  developing  such  functional  descriptions 
is  known  as  HIPO  (Hierarchy  plus  Input-Process-Output)  (IBM,  1975;  Katzan, 
1976).  HIPO  is  a  technique  that  describes  system  functions  in  terms  of  in¬ 
puts,  processes,  and  outputs.  These  functional  descriptions  are  presented 
hierarchically,  showing  in  greater  and  greater  level  of  detail  the  function¬ 
al  relationships  within  and  between  system  components.  Use  of  the  HIPO  tech¬ 
nique  aids  system  development,  in  that  ail  required  inputs,  processes,  and 
outputs  at  each  level  of  functional  detail  are  specified. 

Katzan  (  1976)  describes  a  system  function  as  a  pr- cess  that  accepts  one 
or  more  inputs  and  produces  one  or  more  outputs.  Tne  application  of  this 
definition  in  computer  hardware  or  software  design  is  straightforward. 

:--.r  v  !  -- ,  the  mult  1  ply  function  of  a  CPU  chip  accepts  a  multiplier  and 
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multiplicand,  each  of  fixed  length,  and  returns  a  product.  Valid  input 
sources  and  output  destinations  are  inherent  in  the  chip  design.  The  appli¬ 
cation  to  software  design  is  analogous,  with  the  program  code  determining 
input  sources  and  characteristics,  output  destinations  and  characteristics, 
and  the  intervening  processing  steps  necessary  to  produce  output  from  input. 
The  application  of  this  definition  to  the  design  of  a  psychometric  system 
is  less  obvious.  Even  Chapanis,  writing  on  both  human  factors  in  systems 
engineering  and  on  systems  staffing,  in  de  Greene's  text  on  systems  psychol¬ 
ogy  ti970a,  1970b),  neglects  the  application  of  system  design  principles  to 
the  development  of  psychometric  procedures.  Systems  thinking  is  only  applied 
to  the  problem  of  personnel  selection  and  classification,  and  then  only  in 
the  sense  that  a  systematic  approach  to  selecting,  evaluating,  and  training 
personnel  is  seen  as  a  component  of  a  larger  system  design.  Systems  thinking 
need  not  stop  short  with  the  human  factors  or  engineering  psychology  approach 
however.  It  is  readily  applicable  to  basic  psychometric  developments  as  well 

If  one  defines  a  personnel  measurement  procedure  as  the  administration, 
scoring  and  evaluation  of  the  results  of  a  test  of  some  ability,  questions 
couched  in  system  design  terms  can  easily  be  raised.  What  are  the  desired 
outputs:  test  records,  scores,  selection  decisions?  What  are  the  processes 
required  to  obtain  those  outputs:  administration  of  test  questions  recording 
of  examinee  responses,  scoring,  application  of  selection  rules?  What  are  the 
inputs  required  by  the  specified  processes  to  produce  the  desired  outputs: 
instruction  sets,  test  questions,  examinee  responses,  scoring  keys?  To  be 
sure,  this  is  a  simplistic  example.  However,  it  does  illustrate  that  psycho¬ 
metric  issues  such  as  personnel  measurement  may  be  addressed  from  a  system 
design  perspective,  bringing  to  bear  all  the  tools  and  techniques  of  that 
discipline.  The  design  of  a  CAT  system  is  a  far  more  complex  undertaking, 
yet  the  development  of  a  functional  design  model  for  the  system  greatly  sim¬ 
plifies  the  dual  tasks  of  psychometric  and  engineering  development  and  facil¬ 
itates  their  eventual  integration. 


1 

Functional  Overview  of  a  Computerised  Adaptive  Testing  System 

In  computerized  adaptive  testing,  tests  are  constructed,  administered, 
and  scored  interactively  during  the  actual  testing  session.  What  functions 
are  necessary  to  this  process?  First,  it  is  obvious  that  a  function  encom¬ 
passing  test  construction,  administration  and  scoring  can  be  defined.  Is 
this  sufficient?  Where  do  the  test  questions  to  be  administered  come  from? 
In  CAT,  test  questions  for  each  ability  to  be  tested  are  selected  for  admin¬ 
istration  from  a  set  of  questions  called  an  item  bank.  Item  banks  are  care¬ 
fully  constructed  sets  of  test  questions  having  well-specified  psychometric 
properties,  with  each  item  bank  designed  to  measure  a  single  ability.  It 
then  becomes  obvious  that  a  function  providing  for  item  banking  must  also 
be  defined.  Are  these  functions  now  sufficient?  Remember  that  in  CAT,  a 


1 

The  development  of  a  functional  design  model  for  a  CAT  system  has  been  based 
on  both  the  analysis  of  NPRDC-specif led  requirements  and  objectives  and  the 
author's  experience  with  the  design  of  a  similar  system  at  the  U.S.  Office 
of  Personnel  Management  (see  Croi  1  &  dry,  Note  i). 
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test  is  terminated  when  a  pre-specif led  level  of  reliability  is  reached.  Al¬ 
so,  in  multiple  ability  testing,  a  weighted  composite  score  may  be  required. 
Where  do  the  termination  rules  and  score  weights  come  from?  It  seems,  then, 
that  a  function  providing  such  measurement  control  parameters  is  also  re¬ 
quired.  Have  all  the  runctio..s  necessary  for  CAT  now  been  defined?  In  a 
perfect  world  perhaps,  but  in  the  real  world  things  go  wrong.  How  will  we 
know  if  things  go  wrong  in  CAT?  A  function  providing  for  monitoring  of  CAT 
functioning  and  for  quality  control  reporting  would  do  nicely, 

Through  the  application  of  such  a  simple  functional  analysis  to  the  CAT 
process,  the  four  major  functions  of  a  computerized  adaptive  testing  system 
have  been  identified. 

A  HIPO  package  for  the  initial  design  of  a  CAT  system  has  been  developed. 
From  the  visual  table  of  contents  (Figure  1)  and  the  CAT  system  overview  dia¬ 
gram  (Figure  2),  it  can  be  seen  that  the  four  major  system  functions  have  been 
identified;  item  banking,  measurement  control,  test  administration  and  scor¬ 
ing,  and  monitoring  and  quality  control.  Outputs  of  the  item  banking  and  meas¬ 
urement  control  components  are  required  as  inputs  to  the  test  administration 
and  scoring  component,  and  outputs  from  the  test  administration  and  scoring 
component  are  required  as  inputs  for  monitoring  and  quality  control. 


The  Item  Banking  Function 

This  component  of  the  CAT  system  provides  the  sets  of  test  questions,  or 
item  banks,  necessary  for  adaptive  test  administration.  It  is  composed  of 
three  subfunctions;  test  item  calibration,  item  bank  construction,  and  item 
bank  evaluation. 


Test  item  calibration  refers  to  the  estimation  of  the  latent  trait  para¬ 
meters  a^s  and  o,-  of  candidate  test  questions  for  item  banking  (Urry,  Note 
2).  Input  for  this  subfunction  consists  of  results  from  either  conventional 
or  adaptive  administration  of  the  potential  test  questions.  If  parameters 
are  to  be  estimated  from  conventional  test  results,  examinee  resoonse  data 
and  scoring  keys  for  the  questions  must  be  supplied.  If  parameters  are  to 
be  estimated  from  adaptive  test  results,  ability  scores  must  be  supplied  as 
well. 


An  algorithm  for  the  estimation  of  parameters  from  conventional  test  re¬ 
sults  has  been  described  by  Urry  (Note  3).  Schmidt  and  Urry  (1976)  have  also 
described  the  results  of  an  algorithm  for  the  estimation  of  parameters  from 
adaptive  test  results.  This  algorithm  is  also  described  in  Urry  (Note  3). 
These  algorithms  are  suggested  as  a  guide  for  the  design  of  the  system's 
parameter  estimation  subf unctions.  Parameter  estimation  from  adaptive  test 
results  is  especially  important  in  CAT  in  that  it  permits  on-line  calibration 
of  potential  test  questions  in  the  normal  operational  context  of  the  CAT  sys¬ 
tem.  It  provides  a  means  to  eventually  end  dependence  or,  conventional  test 
results  for  item  parameter  estimation. 

The  test  item  calibration  subfunction  outputs  parameter  estimates  and 
calibration  statistics  for  the  potential  test  questions.  The  parameter  es¬ 
timates  are  then  treated  as  input  to  the  item  bank  construction  subfunction. 
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Functional  overview  of  the  Doh  CAT  system 


The  item  bank  construction  subf unction  takes  the  parameter  estimates  for 
candidate  questions  and  compares  them  against  target  values  for  the  r-  and 
paramet  s, .  The  prescription  for  acceptable  values  of  these  parameters  has 
beer  '.tailed  by  Urry  (1470,  1977b;  Note  2).  Questions  failing  to  meet  this 
p :  *•  s.riptic.i  are  re  jected.  The  remaining  item  parameter  sets  are  then  sort- 
ed  to  ease  later  processing  and  a  rectangular  distribution  of  the  items,  by- 
Che  ;  .•  parameter.  Is  built.  Urry’s  prescriptions  for  the  size  and  distribu¬ 
tional  shape  of  an  item  bank  may  then  be  followed  in  selecting  questions  for 
inclusion.  The  tentative  item  bank  produced  by  this  subfunction  is  then 
evaluated  in  the  item  hank  evaluation  subfunction. 

The  item  bank  evaluation  subfunction  is  designed  to  assess  the  perform¬ 
ance  characteristics  of  an  item  bank  before  it  is  placed  into  operational 
use.  It  is  a  critical  component  of  the  CAT  system  design,  in  that  item  bank 
performance  characteristics  are  a  major  determinant  of  CAT  system  perform¬ 
ance.  A  procedure  for  the  evaluation  of  an  item  bank  has  been  described  by 
Urry  (1974).  From  the  functional  perspective,  the  item  parameter  sets  for 
the  tentative  item  bank  are  used  to  generate  response  vectors  (l's  and  D's, 
or  ’’ights  and  wrongs)  for  simulated  examinees.  Termination  rules  are  se¬ 
lected  for  item  bank  evaluation,  based  on  the  desired  reliability  of  meas¬ 
ures  to  be  constructed  from  the  bank  (Urry  1977b,  Note  2).  These  rules  are 
provided  by  setting  a  prespecified  value  of  the  error  of  the  ability  esti¬ 
mate,  at  which  the  test  sequence  will  be  terminated.  Adaptive  testing  is 
then  simulated  using  the  item  parameter  sets,  response  vectors,  and  termi¬ 
nation  rules,  and  the  results  are  reported.  Only  if  the  item  bank  is  judged 
acceptable  is  it  made  available,  with  associated  question  text,  for  opera¬ 
tional  use.  The  procedural  steps  in  the  item  banking  function  are  repeated 
for  each  ability  for  which  an  item  bank  is  to  be  constructed. 

Additionally,  when  several  item  banks  are  to  be  administered  as  a  multi¬ 
ple-ability  battery,  simulation  of  adaptive  testing  with  the  complete  set  of 
banks  is  conducted. 


The  Measurement  Control  Function 


This  function  is  tne  most  critical  component  of  the  CAT  system.  It  pro¬ 
vides  the  means  through  which  user  (i.e.  service)  answers  to  the  three  basic 
questions  underlying  CAT  system  operation  are  translated  into  system  control 
parameters.  These  three  questions  are  (1)  "What  is  to  be  measured?";  (2)  "What 
degree  of  accuracy  is  to  be  employed?";  and  (3)  "How  are  subtest  scores  to  be 
combined  Into  composite  scores?".  Without  such  a  function,  users  have  no  way 
or  communicating  their  requirements  to  the  system.  User  requirements  are  com¬ 
municated  to  system  personnel  who,  in  turn,  specify  the  measurement  protocols 
required  to  meet  the  user's  needs.  These  protocols  embody  the  measurement  re¬ 
quirements  of  each  system  user  and  determine  both  the  way  in  which  the  adaptive 
testing  process  proceeds  and  the  nature  of  outputs  of  that  process.  They  spe¬ 
cify  the  combination  of  subtests  required  to  meet  specific  measurement  object¬ 
ives  (e.g.  full-ASVAb  vs.  AFQT,  or  service-specific  composites),  the  outputs 
desired  (e.g.  subtest  scores  vs.  weighted  composite  scores),  the  scale  of  mea¬ 
surement  desired,  and  the  accuracy  of  measurement  desired.  They  take  the  form 
of  the  input  stream  required  by  the  systen  to  generate  control  parameters. 
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It.  Is  t h rough  software  generation  or  control  parameters  that  user  meas¬ 
urement  protocols  arc  lop loraented  in  the  CAT  system.  These  parameters  are  of 
three  types:  (lj  termination  rules,  or  terminal  error  values  (values  for  the 
error  of  the  estimate  of  ability),  that  determine  the  point  in  the  adaptive 
testing  sequence  at  which  testing  for  a  particular  ability  is  terminated;  (2) 
subtest  weights,  that  determine  the  relative  contribution  of  a  subtest  score 
to  a  composite  score  (and  which  may  be  zero,  if  a  subtest  score  is  not  to  be 
included  in  a  particular  composite  score);  and  (3)  rescaling  factors,  which 
provide  for  the  conversion  of  scores  based  in  the  system's  standard  scale  of 
measurement  to  an  alternate  scale  of  measurement. 

The  Measurement  Control  function  must  provide  the  capability  for  trans¬ 
lation  of  a  wide  range  of  user  measurement  protocols  into  appropriate  control 
parameters.  It  can  become  quite  complex  as  the  number  and  complexity  of  dis¬ 
tinct  user  protocols  increases.  The  psychometric  bases  for  this  function  have 
been  discussed  by  Urry  (.Note  2,  Note  3).  Its  implementation  is  dependent  upon 
several  necessary  conditions  of  the  total  system  design:  (1)  a  Bayesian  modal 
solution  for  item  parameter  estimates  must  be  used;  (2)  the  Owen  Bayesian  al¬ 
gorithm  must  serve  as  the  basis  for  item  selection  and  ability  estimation;  and 
(3)  a  variable  test  length  termination  strategy,  based  on  target  values  of  the 
standard  error  of  the  estimate  of  ability  Ifor  each  subtest),  must  be  employed 


The  Test  Administration  and  Scoring  Function 

This  component  of  the  CAT  system  provides  for  the  administration  and  scor¬ 
ing  of  adaptive  tests  in  the  live  testing  environment.  It  is  what  is  often 
thought  of  as  the  sole  function  of  a  CAT  system,  since  it  is  the  primary  sys¬ 
tem  function  that  is  implemented  in  the  field-resident  physical  system.  It 
is  composed  of  six  subfunctions. 

Tile  system  start-up  subfunction  is  composed  of  those  steps  necessary  to 
prepare  the  physical  system  (i.e.,  the  hardware  and  software)  for  a  testing 
session.  It  includes  power-up,  self-test,  sign-on,  and  system  status  verifi¬ 
cation  activities. 


The  examinee  log-in  subfunction  performs  the  administrative  tasks  required 
to  identify  the  examinee  to  the*  system  and  to  link  the  examinee’s  test  record 
wi th  the  other  steps  in  the  applicant  processing  sequence.  Inputs  include 
data  from  administrative  forms  and  examinee  supplied  data.  Outputs  ir. elude 
administrative  forms  and  the  examinee  record  into  which  the  test  results  will 
later  be  written.  Additionally,  a  lower-level  subfunction  has  been  specified 
that  Insures  that  examinees  are  correctly  seated  at  the  testing  stations  to 
which  they  have  been  assigned. 


The  fami liar ization  subfunction  is  designed  to  familiarize  the  examinee 
with  both  the  hardware  to  be  used  in  taking  an  adaptive  test  and  the  adaptive 
testing  process  itself.  Introductory,  instructional,  and  practice  material 
is  displayed  tor  the  examinee  on  the  testing  station  display,  and  the  examinee 
enters  the  required  responses  on  the  testing  station  keyboard.  Checks  are  in¬ 
cluded  to  insure  that  the  examinee  is  proceeding  through  the  familiarization 
sequence  successfully.  An  option  has  also  been  designed  for  the  examinee  to 
request  a  repeat  of  the  familiarization  sequence.  Inputs  include  introductory 


instructional,  and  practice  text  and  examinee  responses.  Outputs  consist  of 
displays  of  the  input  text  and  error  messages. 

The  primary  test  subfunction  is  the  heart  of  the  test  administration  and 
scoring  function.  It  is  designed  to  select  and  display  test  questions,  read 
and  score  examinee  responses,  ar.d  update  the  examinee  test  record.  Addition¬ 
ally,  it  provides  for  the  administration  of  experimental  Items  (through  branch¬ 
ing  to  another  subfunction),  selective  retests,  and  test  result  recording  on 
the  testing  site's  master  file.  Inputs  include  control  parameters,  item  pa¬ 
rameters,  item  text,  and  examinee  responses.  Outputs  include  test  item  dis¬ 
plays,  error  message  displays,  and  the  examinee  test  record. 

Several  lower-level  subfunctiens  have  been  specified.  The  Item  adminis¬ 
tration  subfunction  selects  and  displays  test  questions,  reads  examinee  re¬ 
sponses,  and  displays  an  error  message  when  appropriate.  It  also  scores  ex¬ 
aminee  responses  and  updates  the  estimate  of  ability  and  its  associated  error 
value.  Additionally,  it  provides  for  termination  of  the  testing  sequence  in 
a  particular  ability  by  checking  the  current  error  value  of  the  ability  esti¬ 
mate  against  the  pre-specif led  terminal  error  value.  Since  the  item  selec¬ 
tion  and  ability  and  error  updating  procedures  are  peychometriealiy  complex, 
lower  level  subfunctions  have  been  identified  for  these  procedures,  but  not 
specified  in  separate  HIPO  diagrams.  Decisions  regarding  the  nature  of  these 
subfunctions  will  have  to  be  made  within  the  context  of  the  system's  psycho¬ 
metric  development  activities.  The  reader  is  referred  to  Urry  (1977b;  Note 
2,  Note  3)  for  guidance  in  developing  these  procedures. 

The  experimental  item  subfunction  provides  for  the  administration  of  ex¬ 
perimental,  or  potential,  test  questions  within  the  context  of  an  adaptive 
test.  It  selects  and  displays  experimental  items  and  reads  and  records  exami¬ 
nee  responses.  Inputs  include  item  bank  codes,  item  text  and  examinee  respons¬ 
es.  Outputs  include  item  text  displays  and  examinee  reponses  to  the  items. 

This  subfunction  is  called  by  the  primary  test  subfunction,  when  control  codes 
indicate  that  experimental  items  are  to  be  administered. 

The  test  result  reporting  subfunction  is  designed  to  provide  printed  re¬ 
ports  of  test  results,  including  any  required  administrative  forms.  It  in¬ 
puts  data  from  the  testing  site's  configuration  master  file  and  prints  reports 
as  required.  It  is  also  designed  to  feed  testing  results  into  the  AFEES  Re¬ 
porting  System. 


The  Monitoring  and  Quality  Control  Function 

This  component  of  the  CAT  system  provides  for  system-wide  quality  control 
of  all  CAT  system  functions,  as  well  as  for  monitoring  of  the  actual  testing 
process  at  the  testing  site.  It  is  composed  of  three  subfunctions:  testing 
station  monitoring,  quality  c >ntrol  report  generation,  and  special  report  gen¬ 
eration.  The  term  "quality  control",  as  used  in  this  function,  does  not  simply 
imply  physical  system  diagnostics  and  maintenance,  but  also  implies  monitoring 
and  control  of  the  psychometric  integrity  of  the  CAT  system.  Since  the  system 
will  stand  or  fall  on  the  quality  of  its  personnel  measurement,  its  psychome¬ 
tric  integrity  requires  constant  scrutiny. 


317 


Some  suggestions  for  the  testing  station  monitorin''.  sebf ur.e t  i on  are  of- 
ferred.  Three  conditions  might  occur  during  a  tostiu  session  which  Mould 
require  tne  attention  of  the  test  monitor:  the  examinee  nay  fail  to  progress 
normally  through  the  testing  sequence,  and  also  tail  to  request  assistance; 
the  examinee  may,  for  any  reason,  request  test  monitor  assistance;  and  a  fail¬ 
ure  might  occur  in  a  testing  station.  Also,  psychometric  anomalies  i.iay  occur, 
such  as  the  presentation  of  a  sequence  of  items  that  is  excessively  long  given 
the  characteristics  of  the  item  bank.  The  testing  station  monitoring  subfunc¬ 
tion  should  provide  a  constant  display  of  testing  station  status,  so  that  such 
conditions  might  be  identified.  Additionally,  if  a  testing  station  fails,  a 
lower  level  subfunction  should  be  initiated  to  perform  a  recovery/restart  se¬ 
quence. 

The  four  major  functions  identified  in  the  CAT  functional  design  model 
suggest  a  system  structure  which  implements  each  of  these  functions  in  a  sepa¬ 
rate  subsystem,  each  with  its  own  data,  logic,  hardware,  and  software  charact¬ 
eristics.  It  is  through  their  accurate  translation  into  components  of  system 
structure  and  the  implementation  of  system  structure  in  the  physical  system, 
that  CAT  will  finally  have  made  the  transition  from  psychometrics  to  systems. 
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DEVELOPMENT  OF  A  WORK  BEHAVIOR  SURVEY  FOR 
THIRTY-FIVE  FEDERAL  CLERICAL  OCCUPATIONS 

Cynthia  C.  Diane 
Personnel  Research  Psychologist 
United  States  Office  of  Personnel  Management 


A  comprehensive  work  survey  form  was  developed  to  study  thirty-five 
clerical  occupations  in  the  Federal  sector.  The  purpose  of  the  survey 
was  to  develop  new  selection  procedures  which  are  not  only  job  related, 
but  which  also  meet  the  requirements  of  the  Uniform  Guidelines  on  Bn- 
ployee  Selection  Procedures. 

Work  behavior  statements  were  obtained  from  previous  job  analysis  studies 
existing  inventories  arx3  the  Dictionary  of  Occupational  Titles.  The 
statements  were  then  reviewed  by  three  experienced  incumbents  in  each 
of  the  thirty-five  occupations.  The  final  survey  form  incorporated  the 
ccmnents  of  these  subject  matter  experts,  and  contained  a  listing  of 
174  work  behaviors  which  crossed  over  the  thirty-five  occupations  being 
studied.  The  survey  form  has  been  ocnpleted  by  approximately  3000  in¬ 
cumbents,  nationwide.  Data  analysis  will  show  the  degree  of  overlap 
between  these  occupations  and  aid  in  the  development  of  new  selection 
procedures.  N 
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The  decision  by  the  Office  of  Personnel  Management  (OFM)  to  under¬ 
take  a  job  analysis  of  clerical  occupations  was  influenced  by  two  things. 
First,  the  clerical  occupational  group  is  the  most  populous  irs&e 
Federal  sector.  At  present  over  twenty  tests  are  used  to  select^incum- 
bents  in  the  different  clerical  occupations.  The  form  of  sane  oT  these 
tests  have  remained  unchanged  for  many  years,  while  the  jobs  havP^been 
remolded  in  response  to  the  many  technological  advancements  (word  process¬ 
ing,  automatic  data  processing,  etc.).  Secondly,  with  the  signing  of  the 
Lb i form  Guidelines  on  Bnployee  Selection  Procedures  (1978)  came  the 
necessity  to  either  verify  the  adequacy  of  our  current  examination  program 
by  showing  that  the  knowledges,  skills  and  abilities  needed  for  success¬ 
ful  job  performance  were  actually  those  being  tested  for  at  the-^ntry 
level  or  to  develop  new  selection  procedures  which  met  the  criteria  of 
the  Guidelines.  f 

t 

Our  first  task  was  to  define  clerical  occupations  and  to  determine 
which  occupations  in  the  Federal  sector  met  this  def initial.  For  the 
purpose  of  this  study  clerical  occupations  were  defined  as  "those  that 
involve  structured  work  in  support  of  office,  business  or  fiscal  opera¬ 
tions;  performed  in  accordance  with  established  policies,  procedures  or 
techniques;  and  requiring  training,  experience,  or  working  knowledges 
related  to  the  tasks  to  be  performed  (U.  S.  Civil  Service  Commission, 

1976,  p.  XIX). "  With  the  aid  of  Standards  we  vrere  able  to  cate  up  with 
thirty-five  occupations  which  met  this  criteria.  The  final  list  of 
these  occupations  appear  in  appendix  A.  Occupations  requiring  crypto¬ 
graphic  and/or  financial  skills  were  not  included  in  this  set  <^f  occupa¬ 
tions  for  study. 

Our  next  step  was  to  develop  a  work  survey  instrument  which  could  be 
used  to  gather  job  information  across  occupations.  We  needed  a  single 
instrument  winch  incorporated  the  work  behaviors  of  all  thirty-five 
occupations.  Luckily  it  was  not  necessary  to  start  from  scratch,  for 
there  already  existed  several  clerical  inventories  (Gandy  and  Maier  (1979), 
state  of  Minnesota,  state  of  Connecticut,  and  the  Psychological  Cor¬ 
poration).  These  existing  inventories,  along  with  Classification 
Standards  for  the  occupations,  previous  job  analysis  studies  and  the 
Dictionary  of  Occupational  Titles  were  consulted  in  the  development 
of  the  survey.  Fran  these  sources  we  were  able  to  gather  a  listing 
of  work  behaviors  or  tasks  which  seemed  oaiplete.  This  list  of  tasks 
was  reviewed  by  psychologists  at  OFW  and  edited  to  eliminate  duplica¬ 
tions.  A  draft  listing  of  work  behaviors  by  occupation  was  then  given 
to  between  three  and  five  job  incumbents  and/or  supervisors  in  each  of  the 
thirty-five  occupations  for  review.  They  examined  the  list  for  complete¬ 
ness,  clarity  and  organization.  The  suggestions  of  these  reviewers 
were  then  incorporated  into  the  final  form  of  the  Work  Survey  for  Cleri¬ 
cal  Occupations. 

The  final  inventory  is  divided  into  five  parts  and  contains  a 
listing  of  174  task  statements  or  work  behaviors,  divided  among  14 
duties.  The  first  section  contains  questions  concerning  the  incimbents' 
background.  It  includes  questions  on  race,  sex,  ethnicity,  education. 


geographic  location  and  job  series,  etc.  Information  gathered  here 
will  be  used  to  describe  our  sarple. 

The  next  five  sections  are  concerned  with  incumbent  judgments 
about  their  jobs.  Here,  enployees  are  first  asked  to  review  all 
the  tasks  and  check  those  they  perform  on  the  job.  Next,  they  must 
rate  the  tasks  they  perform  on  a  relative  time  spent  scale  -  a  rating 
of  the  anount  of  time  they  spent  on  each  task  they  performed  over 
the  past  six  months  compared  to  all  other  tasks  they  perform.  The 
tasks  must  then  be  rated  on  a  relative  importance  scale  -  the  im¬ 
portance  of  each  task  they  performed  in  the  last  six  months  com¬ 
pared  to  all  other  tasks  they  performed  in  the  same  time  period. 

Each  inc unbent  must  then  ocnplete  the  above  ratings  at  the  duty  level. 

Tasks  were  rated  on  a  seven  point  scale  ranging  frcxn  1  =  very 
much  below  average,  to  7  =  very  much  above  average,  as  recormended 
by  Morsh  and  Archer  (1967),  because  of  the  greater  reliability  and 
precision  of  a  seven  point  scale.  A  list  of  equipment  used  in  these 
occupations  was  also  included  in  the  inventory.  Here,  incumbents  were 
asked  to  check  off  the  equipment  they  use  on  the  job.  In  this  section, 
as  in  the  task  listing,  space  was  alloted  for  the  incumbent  to  make 
additions  where  necessary. 

The  fined  part  of  the  inventory  enlists  the  participants'  im¬ 
pression  about  the  inventory's  clarity,  organization  and  coverage. 

At  this  point  the  Clerical  Wbrk  Behavior  Inventory  has  been 
administered  to  a  sample  of  job  incumbents  in  grades  2  through  5. 

This  grade  span  was  chosen  because  entry  into  these  series  is  usu¬ 
ally  at  the  2  level  with  an  average  full  performance  level  of  5. 

Initially,  it  was  planned  to  sample  200  incumbents  from  each 
of  the  thirty-five  clerical  occupations,  but  after  determining  the 
total  population  for  each  occupation  this  was  not  considered  feasible. 
The  sampling  strategy  decided  upon  took  into  consideration  the  pop¬ 
ulation  of  each  occupational  series  and  in  so  doing  attempted  to 
have  fair  representation  of  incumbents  in  each  of  the  thirty-five 
series. 

Steps  taken  to  derive  the  sarple  were  as  follows: 

(1)  The  CPDF  (Centred  Personnel  Data  File)  was  used  to  de¬ 
termine  the  population  size  of  each  occupation. 

(2)  Sample  size  by  occupation  was  determined  by  the  above 
information.  The  most  populous  occupations  were  represented  by 

a  larger  sanple  size. 

(3)  Using  CPDF  data,  the  largest  employing  agencies  for  each 
of  the  occupations  was  determined. 

(4)  Knowing  the  size  of  the  sanple  needed  for  each  occupation, 
we  took  the  five  agencies  employing  the  largest  number  of  incumbents 
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for  each  series  and  weighted  the  nuiber  of  incmbents  from  these 
agencies  to  the  total  size  of  the  sample  needed  for  each  occupa¬ 
tion. 

(5)  Nunber  of  people  was  chosen  in  order  to  obtain  the  larg¬ 
est  grade  range  and  geographical  spread  for  each  occupation  and  each 
agency  within  each  occupation. 

In  occupations  with  less  than  200  incunbents  a  s apple  as  close  to 
100  percent  as  possible  was  dram. 

It  is  hoped  that  the  data  analysis  will  shew  the  degree  of 
overlap  between  these  occupations  and  add  in  the  development  of  new 
selection  procedures  or  support  our  current  examination  procedures. 
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APPENDIX  A 


CLERICAL  OCCUPATIONS  IN  THE  FEDERAL  GOVEWWENT 


GS  Series 

Title 

134 

Intelligence  Clerk 

203 

Personnel  Clerk 

204 

Military  Personnel  Clerk 

301 

General  Clerk 

302 

Messenger 

304 

Information  Receptionist 

305 

Mail  and  File  Clerk 

309 

Correspondence  Clerk 

312 

Clerk- St eno  Reporter 

316 

Clerk-Dictating  Machine  Transcriber 

318 

Secretary 

319 

Closed  Microphone  Reporter 

322 

Clerk-Typist 

324 

Cold  Type  Composing  Machine  Operator 

350 

Office  Machine  Operator 

351 

printing  Clerk 

354 

Bxskkeeping  Machine  Operator 

355 

Calculating  Machine  Operator 

356 

Data  Transcriber 

357 

Coding  Clerk 

359 

Electric  Accounting  Machine  Operator 

382 

Telephone  Operator 

3.26 


GS  Series 


Title 


’85 

394 

998 

1021 

1046 

1087 

1531 

2091 

2131 

2132 

2133 

2134 


Teletypist 
Oanraunication  Clerk 
Q  aims  Clerk 
Office  Er after 
Clerk-Translator 
Editorial  Assistant 
Statistical  Clerk 
Sales  Store  Clerk 
Freight  Rate  Clerk 
Travel  clerk 
Passenger  Rate  Clerk 
Shipment  Clerk 
Dispatcher 
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Dickinson,  Richard  W.,  Texas  ASM  University,  College  Station,  Texas. 
(Wed.  A.M.) 


CODAP  80:  The  New  Occupational  Analysis  Computer  Systea 

For  the  past  2  1/2  years,  a  new  version  of  the  job  analysis  com¬ 
puter  software  system,  the  Comprehensive  Occupational  Data  Analysis 
Programs  (CODAP),  has  been  under  development  at  the  Occupational 
Research  Division,  Industrial  Engineering  Department,  Texas  A&M  Uni-  j 

versity.  The  time  for  the  system's  initial  release  is  almost  at  hand. 

The  new  system  represents  a  radical  departure  from  existing  job  analy¬ 
sis  computer  software  in  that  particular  attention  has  bee^  given  to 
making  communication  between  the  user  and  the  system  as  “friendly"  ac 
possible,  while  at  the  same  time  maintaining  the  system's  viability 
through  Its  flexibility  in  processing  an  occupational  database.  The 
new  system's  present  capabilities  will  be  discussed,  with  particular 
emphasis  being  given  to  the  future  potential  the  systea  represents  in 
the  analysis  of  occupational  Information.  . 
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CuDAroG;  THE  NE'..  OCCUPATIONAL 
ANALYSIS  COMPUTER  Sv<T-M 


Richard  Dickinson 
Occupational  Research  Division 
Industrial  Engineering  Department 
Texas  ASM  University 


INTRODUCTION 


For  the  past  2  1/2  years,  CODAPSO,  a  new  job  analysis  computer  software 
system,  has  been  under  development  at  the  Occupational  Research  Division  of 
the  Industrial  Engineering  Department  at  Texas  A&M  University.  The  new  syster 
represents  a  radical  departure  from  existing  job  analysis  computer  software 
in  that  particular  attention  has  been  given  to  making  communication  between 
the  user  and  the  system  as  convenient  and  "friendly"  as  possible,  while  at 
the  same  time  providing  the  job  analyst  with  a  database  processing  tool 
flexible  enough  to  meet  both  present  and  future  applications  in  the  analysis 
of  occupational  information. 

CODAP8C 


CODAPSO  is  a  sophisticated  software  system  for  processing  occupational 
data  collected  with  job  inventories.  It  was  designed  with  the  particular 
needs  of  the  job  analyst  in  mind.  Many  aspects  of  the  system  will  satisfy 
demands  unique  to  their  work,  and  much  of  the  system's  terminology  is  orientec 
toward  them.  In  addition  to  its  specialized  job  analysis  characteristics,  th< 
data  handling  and  analysis  features  of  the  system  provide  general  database 
management  capabi 1 i ties . 

The  C0DAP8G  computer  system  provides  the  job  analyst  with  the  tools 
necessary  to  flexibly  investigate  occupational  information  pertaining  to 
classification  of  employees,  training  emphasis,  work-load  distribution, 
promotion  standards,  profile  analysis,  item  analysis  and  personnel  assign¬ 
ment,  For  those  questions  routinely  asked  of  occupational  data  users  will 
find  that  C0DAP8C  represents  a  very  convenient  and  powerful  mechanism  for 
providing  answers. 

Information  collected  with  job  inventoreis  can  usual iy  be  conceptualized 
in  the  form  of  a  2-uimensional  matrix  (see  Figure  1),  with  the  columns  of  the 
matrix  representing  incumbents  (or  workers)  and  the  rows  of  the  matrix  repre¬ 
senting  the  individual  data  items  collected  from  each  incumbent.  These  data 
items  (or  variables)  may  consist  cf  background  information  (such  as  incumbent 
age,  race,  sex,  job  title,  etc.)  or  may  consist  of  incurbnt  responses  to 
time  spent  on  work  tasks,  equipment  usage  or  knowledge,  skill  and  ability 
inaicies.  Since  CODAPSO  makes  very  few  assumption?  about  the  data  that  has 
been  collected  on  incumbents,  few,  if  any,  restrictions  arc-  imposed  as  to 
the  t..?e  of  job  related  informs  tic1  that  may  be  surveyed. 


Processing  of  the  occupational  database  is  pe^'or -ied  through  tne  use 
C00AP80*s  easy  to  learn,  English-like  language.  Suec  i ?ic  :u ccedures  in  t  = 
language  allow  tne  user  to  conveniently  summarize  both  rows  and  columns 
of  a  2 -Diners ional  occupational  database,  as  well  as  provide  report  displa. 
particularly  suited  to  occupational  analysis  and  interpretation. 

CODAPfiO  and  Hierarchical  Clustering 

When  investigating  occupational  infomation  collected  with  job  inventories, 
job  analysts  frequently  find  it  useful  to  have  the  incumbent  workers  surveyt. 
grouped  as  a  function  of  their  similarity  on  seme  job  related  dimension  (oft; 
this  dimension  is  work  time  across  tasks,  but  could  just  as  well  be  any  other 
job  related  index,  such  as  equipment  usage).  In  this  regard,  C0DAP80  is  of 
particular  utility  owing  to  its  capability  to  produce  statistical  summaries 
and  report  displays  based  on  criteria  established  thre^nh  the  use  of  tne  system's 
powerful  hierarchical  clustering  procedure. 

Classification  of  incumbent  workers  can  proceed  in  a  quantitative  and 
systematic  fashion  when  grouping  or  clustering  techniques  are  applied  to  an 
occupational  database.  Interpretation  of  the  results  of  a  cluster  operation, 
though,  is  often  difficult.  C0DAP80  eases  the  job  analysts'  interpretive 
burden  in  two  ways: 

1)  As  part  of  the  output  generated  through  execution  of  CQDAP80's  hier¬ 
archical  clustering  procedure,  a  pictoral  dendogram  (or  tree-diagram) 
is  printed  that  visually  illustrates  the  grouping  process  (see  Figure 
2).  In  effect,  the  job  analyst  is  provided  with  a  "picture"  of  how 
the  incumbent  workers  were  combined  as  a  function  of  criterion  homo¬ 
geneity. 

2)  C0DAP80  "remembers"  the  collapsing  sequence  as  incumbent  workers  are 
combined  during  the  clustering  process.  This  "memory"  of  the  group¬ 
ing  process  allows  very  convenient  reference  of  incumbent  data.  The 
user  need  only  specify  to  C0DAP80  (through  the  use  of  the  system’s 
English-like  language)  the  cluster  groupings  of  interest,  and  tne 
system  automatically  directs  access  and  sumna rizaticn  of  the  data¬ 
base  as  a  function  of  this. 

The  capability  to  automatically  direct  database  processing  based  on  results 
supplied  from  a  cluster  operation  provides  C0DAP80  with  a  distinct  advantage 
over  other  software  packages  when  analyzing  occupational  information  (partic¬ 
ularly  when  investigating  information  pertaining  tc  classification). 

CQDAP80  and  Report  Generation 

C0DAP8Q  provides  the  job  analyst  with  the  ability  to  easi’.*  process  an 
occupational  database  and  then  display  tne  processed  information  in  a  manner 
most  amenable  to  interpretation.  The  vehicle  of  comuni  cation  between  the 
job  analyst  and  C02AP80  is  an  English-like  language  that  conveys  to  the  system 
the  processing  and  report  displays  desired. 


Suppose,  fo'-  example,  the  job  analyst,  after  study  of  the  tree-diagram 
displayed  in  Figure  2,  desired  to  investigate  the  background  characteristics 
of  these  incumbents  defined  by  clustering  to  be  in  g^ouns  73  and  67  (enclosed 
in  brackets,  Figure  2).  Statistics  supplied  with  tne  output  of  tne  tree-dia- 


arar 


1  R; 


iicate  the  two  cluster  groups  to  be  sizable,  with  noth  demonstrating 
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relatively  high  between  and  within  hrntmene*  ty.  To  i»rnd,~  -  *  rhrir.c;  Thai 
would  conveniently  di  sslay  the  backer?:, -d  ( ar  history'  ir* a oration  of  the 
incumbents  in  cluster  n roups  73  and  67,  the  job  anal /St  r,;-td  only  uoecif/ 
the  following  COOAP8C  language  statements: 

BEGIN  DATABASE  EXES  LIE. 

PRINT  COLUMNS  (G73  G67;  NuREUARKS  NCSir'IAR'  AVMf  "\DF  Sf-  ‘i  ' 

ROWS  (H1-H9) 

HEADING :  *  "l  1ST  I  KG -OUT  HISTORY  VARIABLES  Hl-'v  FOR  THE  ■  NCU.“.5E!<T5  1 
'IN  CLUSTER  GROUPS  G73  AND  G67‘ 

'OUTPUT  IS  IN  HIERARCHICAL  SEQUENCE  ORDER'. 

END. 


The  output  generated  from  execution  of  the  above  statements  is  displayed 
in  Figures  3  and  4.  Immediately  apparent  from  e*arl- ti:-i  of  the  output  is 
the  disparity  in  job  title  (history  variable  5,  rer  •■rnenia tier  between  groups 
73  and  67.  Stemming  from  this  point,  tne  jcb  a»*aTys?  r" /  then  use  C00AP8G  to 
provide  information  such  as  those  tasks  per*cmed  by  the  incumber ts  in  tre  twe 
groups  ana  the  distribution  of  their  .•/o^k  t:-,e  across  these  tasks. 

Another  topic  of  interest  to  tne  job  analyst  could  be  average  per  task 
pay  rate  of  the  individuals  performing  the  tasks.  For  example,  the  job  analyst 
may  want  to  investigate  the  pav  rate  characteristic?  per  task  of  the  incumbents 
performing  supervisory  functions  as  opposed  to  incumbents  performing  Icwer- 
level,  delegated  functions.  To  define  the  tasks  of  interest,  perform  the  de¬ 
sired  calculations  and  produce  the  report,  the  job  analyst  would  specify  the 
following  CCDAP80  source  language  statments: 

BEGIN  DATABASE  EXECUTE. 

SELECT  ROWS  MODULE!  (1105-7123) 

‘TASKS  INVOLVED  IN  REFORMING  SUPERVISOR  FUNCTIONS' ; 

ROUS  MODULE?  (T187-T2C9) 

'TASKS  INVOLVED  IN  OPERATING  ON-LINE  n  A  POL, 'A  RE  ‘ ; 

ROUS  MOD! ‘‘GO 2  (T1C5-T123,  Ti?7-’?09) 

' COMBINING  MOGUL E 1  S  MODULE 2  FOR  CONVENIENCE 1 . 

AVALOE  ROWS  M0D1M0D2  FOR  ( INCUMBENTS •  USING  M9 

AV5PAYRATE  :  =  A  VS?  'AVERAGE  ?,’■■.  RATE  rJP  INF  .“BENTS  - 
STDPAVSATE  :=  S'C?  'STD  0?  RA’E  I NC VILEN’S  ”-:0R' 

NPAYP.ATE  :=  N  1  Nu'-BER  OF  CBSERV-’I  NS  IN  CALC':. AT  INS 

PRINT  ROWS  {MODULE!  HOOU1E21  NCSU'vAR;  A  VS"  S""*"  !  ”  N 
COLUMNS  (AVGPAYRATEl  STEPAvrate:  NPAV3..TE  I ' 

•HEADING  :=  'OUTPUT  .ISTINS  S, PER.;?--'  -NO  V  V--  :  V- .  IS- 

m  t  ■' l  N,  :  i  I  —  h.  ...  c r.  !  ■■ 1  ", '•  t 


The  output  genera  tea  frpr  evecutic: 
displayed  in  Figures  5  and  6.  In  add it 
for  those  incur bents  nerforr.i  r.g  the  t’s- 
of  observations  ras  a"  ,:c  :  c-m  t-oKuCi*-.- 
the  cutout  ule.v'I/  ^  n’v... :  fat  tne  i  ■-  c  : 
generally  p; 


' c  P  7 1  {"•'PH  ;  -J  Z'-'  V  3  r  a;  S 

f  no  Tvtw\* ".ir 

$**r'ca'\:  *:  *r.d  r- 

‘  ’  a •  0  1  .  Wl.  J  i  d  '. »?  tf  « 

: '  r'p  ir  ’  ‘h!’".  Z  v':  l 


C0DAP8C ' s  English-like  language  is  easy  to  learn  and  use.  The  language 
statements  described  in  this  paper  represent  only  a  small  fraction  of  the 
leverage  C0DAP80  can  supply  to  occupational  analysis.  Provided  with  CODAP80, 
the  job  analyst  can  actively  investigate  an  occupational  database  in  both 
a  subtle  and  powerful  manner. 
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Figure  1.  Conceptual  representation  of  hypothetical  occupational 
database. 
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Figure  2.  First  page  of  output  displaying  a  pictoral  dendogram  (or  trot 
diagram)  illustrating  the  grouping  process  during  hierarchic- 
clustering. 
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figure  3.  Output  generated  by  CODAP30  displaying  the  background  (history) 
information  of  the  incumbents  in  cluster  group  G73. 
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Figure  4.  Output  generated  by  CODAP80  displaying  the  background  (history) 
information  of  the  incumbents  in  cluster  group  G67. 
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Figure  5.  Output  generated  by  CODAPPO  displaying  supervisory  tasks  as 
..rll  as  pay  rate  statistics  of  the  incumbents  performing 
these  tasks. 
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Figure  6.  Output  generated  by 
as  well  as  pay  rate 
these  tasks. 
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Dodd,  B.  T. ,  Royal  Navy  School  of  Educational  and  Training  Technology, 
HMS  Nelson,  Portsmouth,  England.  (Fri.  A.M.) 


Current  Directions  in  Selection  Research  for  the  Royal  Navy 

\  ‘ 

N 

The  computer  assisted  personnel  selection  system  reported  at 
MTA  1980  is  moving  into  the  procurement  stage  although  the  first 
implementation  will  not  Include  most  of  the  self-correcting  mechanisms 
built  into  the  prototype.  Doubts  concerning  the  volume  of  recruiting 
business  required  in  the  future  have  curtailed  the  money  available. 

Counselling,  the  representation  of  military  technical  life,  and 
aides  to  self  selection  have  been  of  major  concern  in  view  of  the  small 
intakes  required  and  the  consequent  disturbances  when  selectees  decline 
to  join  or  leave  prematurely. 

Microprocessor- based  tests  are  being  considered  in  some  areas  but 
it  is  not  expected  that  paper  testing  will  be  supplanted. 

Short  courses  of  a  few  days'  length  at  the  Royal  Marine  Commands 
Training  Center  are  being  evaluated  as  a  selection  device  in  an 
attempt  to  reduce  the  leaving  rate  due  to  unexpected  physical  demands 
and  life  styles,  k 
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Selaotioa  of  Royal  Stry  Bating*,  VoMe'a  loyal  feral  ferric*,  and  Royal  Marin* 
Otter  tonic*  1*  oarrlad  oat  by  pamaaant  'Caraara  Advisor**  who  tear*  rati rad 
fro*  tte  ferrlo*.  At  thi«  stag*  ttey  ar*  tralnad  My  tte  fenlor  hv«tol<|lit 
(feral),  SF(M) ,  ate  tter*aft«r  suppliaa  t hm  with  t**ta  and  otter  aalaotion 
laatn— it*  and  as*  a  th*y  ar*  pro  parly  oaad,  Vaataga,  adranoa— i at  and  rat  urn 
of  aarvioa  ar*  anoltorsd  by  SP(M)  ate  oaa  alao  traok  tte  aaooaa*  of  individual 
r*araltar*. 

te  raportad  at  MTA  In  1983,  3P(»)  tea  danralopad  a  oowputar  nodal  of  tte 
Mlooiloa  U*1n  that  would  te  poaalbla  If  aacii  raoraltar  could  link  to  a  owtral 
datdteao  to  loan  how  attrmotiv*  a  particular  applicant  would  ho  la  aaoh  of  tte 
Jote  for  which  te  waa  allglbla  on  oaoh  of  tte  antry  data*  for  which  than  waa  a 
vacancy, 

Ala  aanawtet  Idaallaad  ay  at  a*  tea  tean  junasad  by  aooaonlo  olmataaoaa  to 
loam  a  ratter  dlffarant  propomal.  What  la  going  to  te  i  natal  lad  la  a  ncruiting 
■anagwant  ayataa  with  taralnala  only  at  rational  hatdquartar*.  Tter*  la  no 
nouay  for  tte  raorultar  te  te  aqulppad  Uka  aa  air  Una  hooking  dark. 

What  Will  te  now  about  tte  recruiting  aauagaaant  ayataa  will  te  aa  oatlnata 
of  tte  Haoruitlag  Taart  aoora  that  aa  applicant  noada  Is  ordar  to  hava  a  good 
ehanoa  of  being  allooated  a  plaoa.  Tte  RM  has  a  f aw  Job*  difficult  to  fill 
with  waitable  appHoacta,  but  for  newt  branch*#,  aligibl#  applicant  a  azoawl 
r*qulra*«it».  IT  ana’s  Kaoraltlng  Teat  aooraa  la  telow  ths  aatlaatad  cutting 
aoora,  tte  eteaoaa  of  a  plaoa  ar*  alaowt  all.  Thia  la  azpaotad  to  awr*  on 
raoruitlng  raaouroaa. 

Tat  a  fnrtter  waaaura  telng  tain  to  atw  tte  flood  of  aarglaal  applicant* 
la  to  aaka  aura  that  anqulrara  hava  a  ehanoa  to  raalla*  what  they  an 
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At  unit,  Job  mrim  provial m»  na fa  frua  a  rtonitlii 
lttflit  to  a  vldM  naat rlli  for  Royal  Marinos  thm  la  a  potactlal  iwrslt 

Omni  lasting  ao mm  two  day*  at  tfciofc  the  oorloua  nay  aaapl a  training  1»  the 
Roy sir  nod  tha  training  staff  aay  aaaaaa  Just  who  should  be  offered  tha  limited 
places. 

In  between  tha  advertising  leaflet  and  the  tvadagr  potaotlal  reeruita  aonrsa 
there  ara  upl«a«twy  booklets,  oharts  aad  posters,  Tbaaa  ax*  being  overhauled 
and  aaaanbled  lota  folio*  of  taxis  aad  photographs  to  portray  utut  It  Mill 
be  Ilka  to  live  and  work  Is  a  nasal  training  aprlirwarit ,  ana  folio  for  each 
trade,  Whilst  tha  oarfait  lasts,  this  ocphasie  on  Job  laforaatlon  Mill  ha 
tha  principal  natbod  of  dot erring  theaa  ito  night  later  disaster  that  they 
had  aada  a  wrong  efcoioe. 

hr  tha  near  fat  are,  at  lsaot,  eeleotioc  taating  will  oostinas  to  ho  conducted 
using  penal la  aad  booklet ■ ,  aaawara  being  marked  by  hand  againat  a  hay*  A 
structured  biographical  interview  lasting  about  thro*  quart  are  of  an  hour  la 
tha  main  aomroa  of  nrwwdaet  information  although  tha  latarslowar'n  JudgMcta 
ara  later  recorded  on  •  bar—  dlaialnnal  sealing  form.  IXparimants  with  a 
Job  disposition  questionnaire  and  a  Job  knowledge  index  are  la  prograee  although 
it  will  bo  mi  years  before  follow  up  data  will  bo  seel  labia  to  validate 
those  dsrleos. 

As  a  policy  for  near  future  action,  the  naanl  personnel  racaaroh  effort  will 
desots  rather  bn*  resources  to  seeking  proper  arlterloa  data  an  which 
to  validate  eeleotioe  praotloas.  One  Una  of  work  la  att  wept  lag  to  capture 
sigaifioant  Job  kaowledgu  in  ways  whiah  allow  ooaqnterianl  and-of-A raining 
tooting.  9«oh  tenting  la  expect ed  to  yield  aaeful  data  on  the  absorption  and 
retention  of  Job  ksowlsdgo  under  tha  oumut  training  refine .  Tha  art  of 
sritarioo  tact  dcsalyp— t  ad  its  oonprefcanslve  praotloe  ara  not  ooncpd  ansae 
in  naval  training. 


I  • 


hmmUAe*  1— UK  g»<i  i*  Uwtag  a  pwlrini  11a*  of 

tmla|MA  i*l*  411  taadi  Mk  traislac  and  Mltollen  arthnUlM. 
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AN  EVALUATION  OF  THE  FAIRNESS  OF  THE  FLIGHT  APTITUDE  SELECTION  TEST  (FAST) 


John  A.  Dohme,  Ph.D. 
Research  Psychologist 
US  Army  Research  Institute 
Fort  Rucker  Field  Unit 


The  concept  Htest  fai'mess"  has  developed,  only  .tec ently.  A  major  impetus  in 
the  development  and  application  of  the  concent  has  come  fr om  the  publication 
of  the  UnlfoAm  Guidelines  on  Employee  Selection  VAoceduAes  (UGES)  in  197$. 

The  UGES  ate  interpAeted  as  mandating  the  use  of  a  AegA.ession  model  in 
evaluating  test  Vainness.  A  technique  Mas  developed  utilizing  a  AegAession 
model  to  evaluate  the  fairness  of  the  Flight  Aptitude  Selection  Test  (FAST) 
foA  the  gAoupS  identified  by  the  UGES:  Blacks,  Ame Alcan  Indians,  Asians, 

His  panics ,  Caucasians  and  females.  The  AegAession  of  FAST  scoA.es  on  overtoil 
gAades  in  the  Initial  EntAy  Rotary  Iking  { IERLJ)  coarse  teas  pen fo A/ned  foA  each 
of  the  above  gAoups  in  compaAison  ui ith  the  majority  gAoup.  Available  popula¬ 
tion  sizes  were  considered  too  small  to  permit  a  conclusive  fairness  evalua¬ 
tion  at  this  time.  The  fairness  evaluation  Mill  be  Aepeated  stmiatmually 
until  minority  poixilation  sizes  permit  sufficient  poiveA  to  perform  a  definitive 
analysis . 

AN  EVALUATION  OF  THE  FAIRNESS  OF  THE  FLIGHT  APTITUDE  SELECTION  TEST  (FAST) 

"Fairness"  as  a  criterion  for  the  evaluation  of  a  test  or  other  selection 
procedure  is  a  relatively  new  concept.  The  concept  has  evolved  from  the  tech¬ 
nology  of  test  validation  to  answer  the  question,  "Is  this  test/procedure  valid 
for  the  selection  of  minority  as  well  as  majority  applicants?"  Appropriate 
methodology  for  the  evaluation  of  fairness  is  currently  a  matter  for  debate  in 
the  technical  literature  (Ledvinka,  1979).  A  major  impetus  for  the  development 
of  fairness  methodologies  was  the  publication  of  Guidelines  on  Employee  Selec¬ 
tion  Procedures  in  1970  bv  the  Equal  Employment  Opportunity  Commission.  In 
fact,  the  mest  current  version,  the  Uniform  Guidelines  on  Employee  Selection 
(UGES)  (1978),  noted  that,  "The  concept  of  fairness  or  unfairness  of  selec¬ 
tion  procedures  is  a  developing  concept,  (19B(8))."  Since  this  technology 
is  still  developmental,  this  paper  will  review  the  rationale  and  precedence 
for  the  FAST  fairness  evaluation  in  some  detail. 

Technical  standards  for  performing  3  fairness  evaluation  are  addressed  by 
both  professional  and  government  agencies.  The  American  Psychological  Associa¬ 
tion  (APA)  publication.  Principles  for  the  Validation  and  Use  of  Personnel  Se¬ 
lection  Procedures  (1975),  discusses  both  technical  and  ethical  implications  of 
the  choice  of  methodology  in  fairness  research  designs.  The  government  publi¬ 
cation  referenced  above.  Uniform  Guidelines  on  Employee  Selection  Procedures 
(UGES)  published  in  1978,  which  is  a  codified  position  agreed  upon  by  the  US 
Civil  Service  Commission,  the  Department  of  Justice,  the  EEOC,  and  the  Depart¬ 
ment  of  Labor  falls  under  the  scope  of  Title  VII  of  the  1969  Civil  Rights  Act 
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and,  for  that  reason,  carries  the  impact  of  law.  Furthermore,  the  current 
version  of  the  UGES  was  reviewed  by  the  APA  prior  to  publication,  thus,  it  is 
a  synthesis  of  professional  and  governmental  guidance  in  the  technical  and 
ethnical  and  legal  aspects  of  fairness  research  designs.  For  these  reasons, 
this  paper  will  make  frequent  reference  to  the  UGES. 

The  UGES  define  fairness  by  stating  its  obverse:  "When  members  of  one 
race,  sex,  or  ethnic  group  characteristically  obtain  lower  scores  on  a  selec¬ 
tion  procedure  than  members  of  another  group,  and  the  differences  in  scores 
are  not  reflected  in  differences  in  a  measure  of  job  performance,  use  of  the 
selection  procedure  may  unfairly  deny  opportunities  to  members  of  the  group 
that  obtains  the  lower  scores  (Section  14B8a)."  This  definition  has  clear 
implications  in  the  design  of  a  fairness  research  study  In  that  it  specifies 
that  fairness  should  be  defined  in  terms  of  the  bivariate  distribution  of 
test  (or  other  selection  procedure)  scores  and  job  performance  scores.  Specif¬ 
ically,  fairness  is  demonstrated  by  coincident  regression  of  job  performance 
scores  on  test  scores  for  a  minority  group  and  the  majority  group.  Fairness 
does  not  require  that  minority  performance  on  the  test,  or  on  the  job  be  equal 
to  majority  performance  but  only  that  the  test  (or  selection  procedure)  does 
not  over  or  under  predict  minority  performance  vis  a  vis  majority  performance. 

The  UGES  do  not  require  routine  demonstration  of  the  fairness  of  a  selec¬ 
tion  procedure  for  every  minority  group  identified  in  section  4B  (Blacks, 
American  Indians,  Asians,  Hispanic  and  Caucasians) .  Section  14B(8)(b)  states: 
"Where  a  selection  procedure  results  in  an  adverse  impact  on  a  race,  sex,  or 
ethnic  group  identified  in  accordance  with  the  classifications  set  forth  in 
section  4  above  and  that  group  is  a  significant  factor  in  the  relevant  labor 
market,  the  user  generally  should  investigate  the  possible  existence  of  un¬ 
fairness  for  that  group  if  it  is  technically  feasible  to  do  so."  Tn  other 
words,  a  demonstration  of  fairness  is  required  only  where: 

(1)  there  is  evidence  of  adverse  impact  as  defined  in  section  4D  of  the 
UGES ; 

(2)  that  adverse  impact  affects  a  group  identified  in  section  4B  of  the 
UGES ; 

(3)  the  group(s)  affected  comprise  a  significant  factor  in  the  relevant 
labor  market  which  is  defined  in  section  15A(l)(c)  as  constituting  more  than 
2%  of  the  labor  force  in  a  "relevant  labor  area"; 

(4)  it  is  "technically  feasible"  to  investigate  the  fairness  issue. 
Technical  feasibilitv  is  defined  in  section  14B(8)(c)  to  include: 

(a)  sufficient  sample  sizes  to  achieve  statistical  significance; 

(b)  direct  comparability  of  the  samples  in  terms  of  the  actual  jobs 
performed . 


At  this  writing,  military  personnel  in  DOD  agencies  do  not  fall  under  the 
■purview  ot  T:tle  VII,  thus,  may  not  be  legallv  bound  to  the  UGES.  However, 
the  author  Lakes  the  position  that  the  UGES  represent  current  professional 
thinking  in  this  technical  area,  therefore,  thev  provide  appropriate  guidance 
independent  o?  their  status  as  law. 


The  issuer  raised  in  paragraphs  1-3  above  are  empirical  questions.  They 
are  best  answered  by  descriptive  data  pertaining  to  the  population  of  applicants 
to  US  Array  flight  training.  The  Fort  Rucker  Field  Unit  of  AR1  began  an  investi¬ 
gation  of  the  selection  rates  of  applicants  of  the  groups  identified  in  section 
48  of  the  UGES.  Data  were  requested  from  MILPERCEN  and  RCPAC  and  a  quality 
check  was  performed  on  the  data  obtained  from  the  master  files.  Master  file 
data  were  cross  referenced  with  data  in  the  student  pilot's  flight  folders  at 
the  Directorate  of  Training  at  Fort  Rucker.  Taking  the  black  group  as  an 
"xample,  master  file  data  were  missing  for  over  782  of  the  trainees,  i.e.,  7S% 
of  individuals  who  had  entered  the  flight  training  course  did  not  appear  in  the 
master  file.  Therefore,  it  must  be  concluded  that  the  selection  rates  prior 
to  1980  are  indeterminate  and  adverse  impact  cannot  be  assessed. 

With  the  advent  of  the  revised  FAST  test  (RFAST)  which  replaced  the  earlier 
form  in  the  field  in  early  1980,  the  data  collection  problem  referenced  above 
has  been  alleviated.  The  RFAST  answer  sheet  requests  information  on  the  sex 
and  ethnic  status  of  applicants.  All  RFAST  answer  sheets  are  sent  to  ARI, 

Fort  Rucker  for  machine  scoring  and  storage  in  the  RFAST  archives,  thus,  all 
the  information  needed  to  determine  whether  or  not  adverse  impact  exists  will 
be  available  at  ARI  Fort  Rucker.  Given  that  it  commonly  takes  more  than  one 
year  between  taking  the  RFAST  and  graduation  from  the  34  week  training  program, 
it  will  be  some  time  before  adverse  impact  can  be  determined  for  the  RFAST. 

In  the  interim,  the  conservative  assumptions  will  be  made  that  adverse 
impact  does  exist  for  all  the  groups  identified  by  Section  4b  of  the  UGES, 
and  that  each  of  those  groups  constitutes  more  than  IX  of  the  applicant  popu¬ 
lation.  Pursuant  to  Section  14B(8)  of  the  UGES,  a  fairness  evaluation  will 
be  undertaken  for  each  group  where  it  is  "technically  feasible"  to  do  so. 
However,  the  issue  of  technical  feasibility  is,  like  the  issue  of  fairness, 
a  matter  of  some  debate  in  the  technical  literature.  As  noted  above,  the 
UGES  discuss  the  issue  of  technical  feasibility  with  reference  to  sample 
size  and  comparability.  In  an  empirical  study  of  the  statistical  power 
associated  with  various  sample  sizes,  Schmidt,  Hunter  and  Urry  (1976)  con¬ 
cluded  : 

"This  studif  demons  tXi.it Hi  that  s ample  s  tzes  required  tv  produce 
adequate  peu'er  in  c mpirical  validation  studies  ate  substantially 
larger  than  has  typically  been  assumed.  This  finding  leads  tc 
the  conclusion  that,  3 tom  the  v< etepe ait  c<$  sample-size  iequi.rc- 
ments ,  elite  lion- related  validity  studies  ate  "technically 
feasible"  much  less  9 requently  thcui  is  ccmmontij  assumed  (p.  4 73)." 

Using  the  methodology  developed  by  Schmidt,  Hunter  and  Urry  (1976)  to  estimate 
the  sample  size  required  in  the  present  evaluation,  and  making  the  liberal 
assumptions  that  (1)  the  true  validity  of  the  FAST  test  is  .‘>0,  (2)  the 
reliability  of  the  Initial  Entry  Rotary  Wing  (IERW)  overall  grade  is  .60  and 
(3)  70%  of  the  applicants  to  the  IERW  program  are  accepted,  128  subjects  per 
group  would  be  required  to  reach  a  power  of  .90  (i.e.,  to  have  a  90%  probability 
of  rejecting  the  null  hypothesis  if  it  is  indeed  false).  Thus,  from  the  stand¬ 
point  of  the  Schmidt,  Hunter  and  Urry  (1976)  article,  it  is  not  technically 
feasible  to  perform  a  fairness  evaluation  of  the  FAST  until  a  larger  sample 
of  IERW  graduates  is  available. 
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An  earlier  section  of  this  research  report  noted  that  a  revised  version  of 
the  FAST  (the  RFAST)  is  presently  being  implemented  in  the  field.  The  version 
of  the  FAST  being  evaluated  for  fairness  in  this  report  has  two  different 
forms  developed  for  implementation  with  commissioned  officers  and  enlisted 
personnel  respectively.  Since  the  two  forms  differ  substantially  in  content 
and  number  of  items,  the  current  fairness  evaluation  must  be  conducted  sepa¬ 
rately  for  these  two  populations.  There  is  only  one  form  of  the  RFAST  which 
has  been  developed  for  use  with  both  populations.  Therefore,  future  fairness 
evaluations  will  not  require  separate  commissioned  and  enlisted  samples  which 
will  considerably  ameliorate  the  problem  of  collecting  samples  large  enough 
to  permit  a  conclusive  fairness  evaluation. 

One  key  issue  in  the  design  of  a  fairness  evaluation  study  is  the  choice 
of  a  statistical  model  to  guide  the  minor ity/maj or ity  comparisons.  Section 
14B8  of  the  UGES  raises  the  point  that  the  concept  fairness  is  still  evolving 
in  the  literature.  Specifically,  the  choice  of  a  statistical  model  has  been 
debated  for  nearly  a  decade  since  the  publication  of  the  1970  version  of  the 
EEOC  Guidelines  (see  Cole,  1972;  Hunter  and  Schmidt,  1974;  Hunter,  Schmidt 
and  Rauschenberger ,  1977  and  Ledvinka,  1979).  The  current  literature  focuses 
on  four  models  which  lead  to  different  operational  definitions  of  fairness/ 
unfairness : 

1.  The  regression  model  (Cleary,  1968)  which  states  that  a  test  is  fair 
if  the  regression  lines  predicting  job  performance  are  the  same  (plus  or  minus 
sampling  variation)  for  minority  and  majority  groups. 

2.  The  conditional  probability  model  (Darlington,  1971;  Cole,  1973)  which 
states  that  a  test  is  fair  if  the  probability  of  being  selected  is  the  same 

for  minority  and  majority  group  members  who  are  actually  capable  of  satisfactory 
job  performance. 

3.  The  constant  ratio  model  (Thorndike,  1971)  which  states  that  a  test  is 
fair  if  its  selection  ratio  for  minority  and  majority  groups  is  the  same  as 
the  selection  ratio  using  a  perfectly  valid  test  (or  using  the  criterion 
measure  itself  for  selection). 

4.  The  quota  model  which  states  that  a  test  is  fair  if  its  selection  ratio 
is  the  same  for  all  minority  and  majority  groups  regardless  of  group  performance 
on  the  job. 

While  various  authors  continue  to  argue  the  technical  and  ethnical  merits 
of  these  models,  it  has  been  pointed  out  bv  Ledvinka  (1979,  p.  552)  and  by 
Hunter,  Schmidt  and  Rauschenberger  (1977,  p.  256)  that  the  UGES  clearly 
specify  the  regression  model  as  being  legally  appropriate  in  the  conduct  of 
fairness  research.  Two  UGES  passages  can  be  cited  to  document  this  point. 

"When  members  of  one  'iacc,  sex  or  ethnic  group  charactcMSticcMy 
obtain  iotocr  scores  on  a  selection  procedure  than  members  of 
another  group,  and  the  differences  in  scores  arc  not  reflected 
in  di ffcrences  in  a  measure  of  job  performance  .  .  .  ."  [SecXxon 
14B8a)." 


"Ij  un&ainneki  if,  dement,  tnated  through  a  knowing  that  membent 
oi  a  panticulan  gnoup  pen&onm  betten  on.  poon.nA  on  the  job,  then 
theiA  kconek  on  the  i election  pn oceduAe  would  indicate  thnough 
ccmpani&on  with  how  membenk  ofi  otheA  gnoupi  pen^onm,  the  men 
may  eitheA  nevi&e  on.  neplace  the  selection  inktniunent  in  accond- 
ance  with  the&e  guidelines,  on  may  continue  to  me  the  selection 
inktAument  openationally  with  appnopniate  neviiiom  in  its  me 
to  m&une  compatibility  between  the  pnobabiiity  ofi  successful 
job  penfonmance  and  the  pnobabiiity  o f  being  selected,"  ( Section 
24BSd). 

There  is  an  additional,  independent  reason  to  use  the  regression  model 
in  this  fairness  evaluation.  Of  the  four  models,  it  alone  does  not  require 
a  "pass  through"  methodology  in  which  IERW  applicants  are  selected  for 
flight  training  regardless  of  their  FAST  scores.  While  a  pass  through 
methodology  is  technically  appropriate  in  fairness  research,  it  incurs 
a  substantial  increase  in  attrition  rate  over  the  use  of  an  efficacious 
selection  procedure.  Given  that  the  training  costs  in  the  IERW  program 
exceed  $125,000  per  trainee,  the  two  costs  of  a  pass  through  program,  higher 
attrition  costs  and  a  reduced  output  of  trainees,  could  conceivably  cost 
the  government  millions  of  dollars  per  year  and  lead  to  an  even  greater 
shortfall  in  aviators  in  the  field. 

METHOD 

The  subjects  that  comprise  the  minority/female  samples  include  all  IERW 
program  trainees  who  identified  themselves  as  belonging  to  one  of  the  groups 
previously  identified  in  the  UGES  (Black,  Hispanic,  Asian,  American  Indian, 
female)  and  for  whom  both  FAST  and  IERW  overall  grade  (OAG)  data  were  avail¬ 
able  in  US  Army  Aviation  Center  (USAAVNC)  records.  The  data  collected  cover 
the  time  span  July  1975  tc  July  1979. 

In  order  to  develop  the  regression  comparison  procedure  and  to  estimate 
the  fairness  of  the  FAST  as  a  predictor  of  performance  in  the  IERW  Program, 
a  sample  of  the  FAST  and  OAG  scores  for  majority  trainees  was  selected. 
During  the  same  time  period  that  scores  were  monitored  for  the  minority 
samples  described  in  this  report,  a  random  sample  of  10%  of  majority  offi¬ 
cers  and  10%  of  majority  WOCs  was  drawn  from  the  majority  population. 

The  sample  sizes  for  minority/female  and  majority  commissioned  officers 
and  WOCs  are  presented  in  Table  1. 

The  Introduction  Section  of  this  paper  developed  the  concept  that  the 
evaluation  of  test  fairness  requires  the  comparison  of  minority/female  and 
majority  regression  lines.  A  statistical  technique  was  specifically  formu¬ 
lated  for  this  purpose  by  Gulliksen  and  Wilks  (1950).  Additionally,  there 
is  precedence  for  the  application  of  this  procedure  under  the  mandate  of  the 
UGES  (Reilly,  Zedeck,  and  Tenopyr,  1979).  The  Gulliksen  Wilks  technique, 
which  was  derived  from  Neyman-Pearson  likelihood  ratio  test  theory,  tests 
three  null  hypotheses  sequentially  (1950,  p.  96): 


349 


1.  HI  is  the  hypothesis  that  the  populations  from  which  the  samples 
were  drawn  have  equal  standard  errors  of  the  estimate  (around  the  least 
squares  regression  line). 

2.  H2  is  the  hypothesis  that  the  slopes  of  the  population  regression 
lines  are  the  same. 

3.  H3  is  the  hypothesis  that  the  Y- intercepts  of  the  regression  lines 
are  equal. 

In  applying  the  technique,  the  three  hypotheses  are  tested  sequentially 
starting  with  HI.  If  any  hypothesis  is  rejected,  hypothesis  testing  stops 
and  it  is  concluded  that  the  samples  were  drawn  from  different  bivariate 
populations.  If  all  three  null  hypotheses  are  retained,  then  the  samples 
have  the  same  bivariate  dispersion,  slope  and  intercept  and  thus,  coincident 
regression  lines. 

In  applying  the  Gulliksen  Wilks  technique  to  the  current  fairness  evalua¬ 
tion,  a  significant  problem  arises  because  of  the  small  sample  sizes  currently 
available  for  ethnic  and  female  IERW  trainees.  Gulliksen  and  Wilks  state  that 
their  primary  purpose  is,  ".  .  .  to  present  large-sample  tests  for  the  hypoth¬ 
eses  considered  from  the  point  of  view  of  Neyman-Pearson  likelihood  ratio  test 
theory  (1950,  p.  94)."  The.  smallest  sample  in  the  Reilly,  et.  al .  (1979) 
experiments  included  45  subjects.  A  conservative  statistician  would  prefer 
to  have  100  data  points  in  a  "large  sample"  bivariate  distribution.  How¬ 
ever,  it  is  clear  that  the  sample  sizes  in  the  current  research,  which 
range  from  a  high  of  22  Black  Officers  to  a  low  of  3  Oriental  Officers,  do 
not  meet  the  sample  size  requirement  for  the  Gulliksen  Wilks  procedure. 

A  search  of  the  statistics  literature  produced  a  regression  line  com¬ 
parison  procedure  which  was  derived  from  the  analysis  of  covariance  rather 
than  from  Neyman-Pearson  likelihood  ratio  theory.  Snedecor  and  Cochran 
(1967,  pp-  432-436)  present  a  procedure  which  tests  the  same  three  sequen¬ 
tial  hypotheses  discussed  by  Gulliksen  and  Wilks  (1950).  This  procedure, 
while  it  is  sensitive  to  the  usual  assumptions  made  by  parametric  statis¬ 
tics,  is  not  based  on  the  assumption  of  large  sample  sizes. 

•  RESULTS 


Table  1  presents  sample  sizes,  means,  and  standard  deviations  for  the 
Commissioned  Officer  and  WQC  samples.  In  addition,  the  correlation  of  the 
FAST  and  overall  grade  for  each  group  and  the  significance  of  that  correla¬ 
tion  coefficient  is  shown.  At  least  in  part  because  of  the  small  sample 
sizes  of  the  minority  and  female  samples,  only  2  of  the  10  correlations 
attained  significance.  In  both  of  the  majority  samples,  the  FAST  proved  to 
be  a  significant  predictor  of  overall  grade  despite  the  restriction  in  range 
caused  by  the  prior  use  of  FAST  scores  as  a  selection  criterion  (Commissioned 
Officers  must  score  at  least  155  and  enlisted  or  civilian  entry  must  score  at 
least  300^  to  gain  admission  to  the  IERW  training  program).  In  reality,  the 


2 

Since  these,  data  were  collected,  the  FAST  cutoff  score  for  WOCs  was  reduced 
to  270. 
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restriction  of  range  problem  applies  only  to  the  WOC  samples  since  very  few 
of  the  Commissioned  Officer  applicants  score  below  155.  The  lesser  restric¬ 
tion  of  range  in  the  officer  sample  is  the  most  probable  explanation  for 
the  generally  higher  correlations  in  that  group,  as  contrasted  to  the  WOC 
samples. 

The  three  hypotheses  tested  in  the  fairness  evaluation  concern  the  equality 
of  the  standard  errors  of  the  estimate,  the  slopes,  and  the  Y-intercepts  for 
the  minority/ female  and  majority  regression  lines.  The  logic  of  the  hypothesis 
test  procedure  requires  that  the  three  hypotheses  be  tested  sequentially.  That 
is,  the  hypothesis  of  equal  dispersion  about  the  common  regression  line  is 
tested  first.  If  that  F-ratio  reaches  significance,  the  hypothesis  test  pro¬ 
cedure  stops  and  it  is  concluded  that  the  two  samples  are  not  taken  from  the 
same  bivariate  population.  If  the  F-test  for  equality  of  variance  about  the 
common  regression  line  is  nonsignificant,  then  the  second  hypothesis  is  tested, 
i.e.,  the  two  slopes  are  compared.  Again,  if  the  F-ratio  reaches  significance, 
it  is  concluded  that  the  two  regression  lines  are  not  the  same.  If  the  F-ratio 
is  nonsignificant,  then  the  third  hypothesis  is  tested,  i.e.,  the  Y-intercepts 
(or  elevations)  of  the  two  regression  lines  are  compared.  Again,  if  the  F- 
ratio  reaches  significance,  it  is  concluded  that  the  two  samples  did  not  come 
from  the  same  bivariate  population.  Only  if  all  three  hypothesis  tests  yield 
nonsignificant  F-ratios  can  it  be  concluded  that  the  two  regression  lines  are 
coincident . 

Given  the  very  small  population  sizes  available  at  the  time  this  research 
was  undertaken,  it  might  be  misleading  to  present  hypothesis  test  results. 

The  statistical  power,  even  in  the  largest  minority/majority  comparison,  is 
not  sufficiently  large  to  ensure  rejection  of  the  null  hypotheses  if  they 
are  indeed  false.  Thus,  these  data  will  be  retained  and  the  fairness  analysis 
will  be  repeated  biannually  until  such  time  as  sufficient  data  are  available 
to  perform  a  conclusive  study. 


DISCUSSION 

As  noted  previously,  the  data  base  for  minority  and  female  IERW  trainees 
is  not  of  sufficient  size  to  permit  drawing  conclusions  regarding  the  fairness 
of  the  FAST  as  a  selection  device.  The  purpose  of  this  paper  is  to  develop 
the  rationale  and  methodology  for  such  a  fairness  evaluation.  Thus,  the 
current  discussion  will  focus  primarily  on  methodological  issues. 

In  accordance  with  the  UGES  the  fairness  of  a  selection  procedure  should 
be  determined  by  reference  to  the  regression  of  that  selection  test  (or  pro¬ 
cedure)  on  job  referenced  criteria.  Section  14B(3)  of  the  UGES  notes  that 
training  performance  is  an  acceptable  criterion  under  certain  conditions: 

"Where  performance  in  training  is  used  as  a  criterion,  success 
in  training  shouZd  be  property  measured  and  the  reZevance  o£ 
the  training  should  be  shown  either  through  a  comparison  o£ 
the.  content  ofi  the  training  program  with  the  critical  OA  impor¬ 
tant  work  behauioris)  OjJ  the  job[s),  or  through  a  demonstration 
0(5  the  relationship  between  measures  ojJ  performance  in  training 
and  meoAures  ofi  job  performance.  Measures  o^  relative  success 
in  training  ZncZude  but  are  not  Limited  to  instructor  evaluations , 
performance  samples,  or  tests." 
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The  IERW  training  program  clearly  meets  the  conditions  specified  in  14B(3) 
by  virtue  of  the  content  of  the  training  program  and  the  measures  of  relative 
success  employed  as  grading  procedures.  The  curriculum  of  the  IERW  Program 
of  Instruction  (POI)  has  been  developed  specifically  to  train  aviators  to 
perform  Army  aviation  missions  in  the  field.  Thus,  the  content  of  the  training 
program  corresponds  very  closely  to  the  critical  work  behaviors  performed  on 
the  job.  Training  grades  are  composed  of  the  three  components  identified  in 
the  UGES:  Instructor  evaluations  (Instructor  Pilot  put-up  scores),  performance 
samples  (checkrides) ,  and  tests  (academic  examinations).  The  IERW  overall 
grade  which  is  used  as  a  criterion  in  this  research  is  a  composite  of  all  three 
evaluation  components.  In  summary,  the  design  of  the  current  fairness  evalua¬ 
tion  is  in  accordance  with  the  directives  of  the  UGES. 

While  the  sample  sizes  for  the  minority/female  groups  presented  in  Table  1 
are  too  small  to  justify  the  drawing  of  inferences  to  the  entire  populations 
of  female  and  minority  aspirant  aviators,  several  points  warrant  discussion. 

For  both  Hispanic  samples  (Officer  and  WOC) ,  the  FAST  has  a  nonsignificant 
negative  correlation  with  overall  grade.  Inspection  of  the  scatter  diagrams 
in  both  cases  reveals  that,  while  the  general  linear  trend  is  positive  for 
the  entire  sample,  two  or  three  outliers  with  extreme  scores  unduly  influenced 
the  regression  line.  For  example,  in  the  Commissioned  Officer  sample,  the 
individual  with  the  highest  IERW  overall  grade,  89.35,  has  an  unusually  low 
FAST  score,  197.  Expressed  as  standard  scores,  this  individual's  overall 
grade  is  z  =  1.44  whereas  his  FAST  is  z  =  -1.12.  Conversely,  the  individual 
with  the  lowest  overall  grade,  79.39,  has  a  moderately  high  FAST  score,  313. 
Expressed  as  standard  scores,  overall  grade  z  =  -2.71  and  FAST  z  =  .83.  If 
these  two  individuals  are  removed  from  the  distribution,  the  correlation  for 
the  remaining  12  individuals  is  .193.  The  sensitivity  of  this  correlation 
coefficient  to  only  two  data  points  demonstrates  the  inappropriateness  of 
generalizing  from  the  small  minority  and  female  samples  in  the  current  study. 

The  purpose  of  this  research  effort  is  to  establish  an  appropriate  meth¬ 
odology  to  evaluate  the  FAST  for  fairness.  The  methodology  reviewed  in  this 
paper  has  been  programmed  for  automated  computation  on  a  computer.  Additionally, 
a  mechanism  has  been  established  to  collect  data  on  minority/ female  and  majority 
IERW  trainees.  As  more  minority/ female  trainees  complete  pilot  training,  the 
fairness  evaluation  will  be  iteratively  performed  until  sample  sizes  permit 
sufficient  statistical  power  to  draw  conclusions  about  the  fairness  of  the  FAST. 
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The  "New"  Army  Officer  Evaluation  Reporting  System 

s 

\  » 

The  latest  Officer  Evaluation  Reporting  System  (OERS)  represents 
the  most  substantive  change  in  officer  evaluation  concept  and  philos¬ 
ophy  since  World  War  II.  It  incorporates  several  new  features  which 
(  have  not  been  included  in  previous  officer  evaluation  reporting  sys¬ 
tems:  rated  officer  participation,  senior  rater  concept  and  a  senior 

rater  profile.  Participation  by  the  rated  officer  in  the  evaluation 
process  provides  additional  information  from  the  rated  officer's  point 
of  view  to  rating  officials,  encourages  counseling  early  in  the  rating 
period  and  enhances  the  effectiveness  of  organizations  by  relating 
performance  to  organizational  mission?.  The  senior  rater  concept 
Increases  the  role  of  the  most  senior  rating  official  from  a  purely 
administrative  role  to  include  a  critical  evaluation  of  the  rated 
officer's  potential.  The  senior  rater  profile  provides  a  comparison  of 
a  specific  rating  and  a  senior  rater's  normal  rating  tendency  (e.g., 
easy  or  hard)  by  tracking  the  rating  history  of  the  senior  rater  and 
making  it  visible  to  selection  boards  and  DA  managers.  — 
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THE  "NEW"  ARMY  OFFICER  EVALUATION  REPORTING  SYSTEM 


Richard  D.  Doorley 
U.S.  Army  Military  Personnel  Center 


The  U.S.  Army  transitioned  to  a  new  Officer  Evaluation  Reporting 
System  (OERS)  during  the  latter  part  of  1979.  The  development  of  the 
new  system  took  over  six  years  during  which  several  thousand  officers 
from  the  field  participated  in  one  way  or  another  in  the  developmental 
process,  and,  to  a  large  extent,  determined  the  makeup  of  the  new 
system.  The  process  included:  an  Army-wide  field  test  conducted  in 
110  Active  Army,  National  Guard  and  Reserve  organizations,  an  Army-wide 
survey,  extensive  involvement  with  selection  boards  and  career  managers; 
a  review  of  the  regulation  and  supporting  policies  with  the  major 
commands;  and  a  review  of  the  performance  evaluation  systems  of  sister 
services,  government,  industry,  academia,  and  many  allied  military 
services. 

The  performance  evaluation  literature  reveals  many  commonalties. 

The  seated  purpose  in  most  organizations  is  to  provide  information  for 
making  traditional  management  decisions:  promotion,  assignments, 
awards,  training  and  demotion.  The  overall  trend  is  away  from  evaluat¬ 
ing  personality  traits  to  management  by  objectives  techniques  with 
employee  participation  and  feedDack  requirements.  All  evaluation  systems 
suffer  from  varying  degrees  of  inflation  which  appears  to  be  positively 
related  to  the  importance  placed  on  the  ratings.  Few  organizations 
exercise  any  centralized  control  over  raters  but  most  have  a  requirement 
for  additional  evaluators  and/or  reviewers  to  insure  fairness  in  the 
evaluation  process. 

Past  Army  Systems 

Prior  to  the  1920's  Army  officer  management  decisions  were  handled 
at  unit  level  based  primarily  on  personal  knowledge.  After  World  War  I 
officer  personnel  management  was  centralized  and  a  standardized  form 
developed,  Form  67,  which  was  adopted  in  1922.  This  form,  with  minor 
revisions,  was  used  through  World  War  II  when  massive  reductions  put 
extreme  pressure  on  the  Officer  Evaluation  System  (OES)  and  resulted  in 
major  changes.  Since  then  there  have  been  eight  editions  of  the  OER 
caused  primarily  by  turbulence  in  officer  personnel  management  and  the 
concommitant  pressures  to  inflate  ratings,  especially  with  an  up  or  out 
promotion  system. 

Numerous  evaluation  techniques  and  methods  have  been  employed  cm 
past  forms:  narratives,  rating  scales,  forced  choice  items,  forced 
distribution  scales,  cumulative  numerical  scores,  and  "closed"  or  secret 
reports.  The  success  of  the  various  techniques  was  dependent  upon 
acceptance  by  the  officer  corps,  how  well  they  were  designed  and  regula¬ 
ted  and  in  some  cases  when  they  were  used. 


In  1973  shortly  after  the  implementation  of  DA  Form  67-7,  serious 
doubts  were  raised  concerning  its  effectiveness.  It  inflated  rapidly 
as  published  benchmark  scores  became  floors  rather  than  Army-wide 
averages.  Lack  of  confidence  and  support  from  the  officer  corps  and 
leadership  led  further  to  Its  demise;  consequently,  the  development  of 
a  follow-on  OER  began  early  in  1973. 

The  Officer  Evaluation  Reporting  System  (OERS)  is  a  subsystem  of  the 
Officer  Evaluation  System.  It  Includes  procedures  for  organizational 
evaluation  chain  assessment  of  an  officer’s  performance  and  an  estimation 
of  potential  for  future  service  based  on  the  manner  of  that  performance. 
The  major  function  of  the  OERS  is  to  provide  information  rom  the 
organizational  rating  chain  to  Department  of  the  Army  (DA)  for  officer 
personnel  decisions.  The  other  two  functions  are  to  encourage  the 
professional  development  of  the  officer  corps  and  to  enhance  mission 
accomplishment.  To  support  these  functions  the  new  OERS  incorporates 
several  new  features  which  have  not  been  included  in  previous  officer 
evaluation  reporting  systems. 

The  New  Officer  Evaluation  Report 

During  the  field  test,  the  increased  involvement  of  the  rated 
officer  in  the  evaluation  process  was  strongly  endorsed.  Rated  officer 
input  in  the  form  of  developing  the  duty  description  and  performance 
objectives  was  viewed  as  an  ideal  technique  for  increasing  two  way 
communications  between  the  rater  and  rated  officer,  especially  in  terms 
of  developing  and  clarifying  the  elements  of  the  rated  officer’s  per¬ 
formance.  Rated  officers  perceived  an  increased  awareness  of  the  specific 
nature  of  the  job  as  well  as  an  increased  opportunity  to  influence  de¬ 
cisions  on  mission  accomplishment.  Rating  officials  gained  valuable 
insight  into  the  status  of  the  organization  along  with  additional  infor¬ 
mation  upon  which  to  base  an  accurate  evaluation.  Now  for  the  first  time 
the  rated  officer  is  included  in  the  rating  process  with  the  use  of 
DA  Form  67-8-1,  OER  Support  Form.  Participation  by  the  rated  officer 
addresses  all  thr  ,e  functions  of  the  evaluation  system.  It  provides 
additional  inforaation  to  the  rating  chain  from  the  rated  officer's  point 
of  view,  encourages  two-way  communication  and  professional  development, 
and  increases  the  effectiveness  of  organizations  by  focusing  performance 
more  directly  on  the  mission. 

The  front  side  of  the  Support  Form  (Figure  1)  contains  rated  officer 
and  rating  chain  identification  and  provides  for  the  rated  officer's 
description  of  his  duties,  major  objectives  and  contributions  during  the 
rating  period.  The  reverse  side  (Figure  2)  provides  for  rater  and  inter¬ 
mediate  comments  on  the  rated  officer's  input,  as  well  as  instructions  for 
completing  the  form.  The  Support  Form  is  for  rated  officer  and  rating 
officials  use  and  is  not  forwarded  to  DA.  At  the  beginning  of  the  rating 
period  it  is  used  as  a  guide  for  discussion  between  the  rater  and  rated 
officer  about  the  rated  officer's  duties,  responsibilities  and  performance 
objectives  for  the  period.  During  the  evaluation  period,  it  acts  to  guide 
the  performance  of  the  rated  officer  and  the  counseling  and  coaching  by 
the  rater.  At  thp  end  of  the  rating  period,  it  gives  the  rated  officer  an 
opportunity  to  provide  the  rating  chain  information  about  his  performance 
from  his  point  of  view. 
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At  the  beginning  of  the  rating  period,  the  rated  officer  receives  a 
copy  of  the  OEK  Support  Form  which  is  blank  except  for  the  name,  grade 
and  position  of  his  rater  and  the  positions  of  the  remaining  members  of 
the  rating  chain.  Within  the  first  thirty  days  of  the  rating  period,  the 
rated  officer  and  rater  are  required  to  discuss  the  specific  nature  of  the 
rated  officer's  duties  (his  duty  description)  and  the  focus  or  direction 
of  his  performance  (his  major  performance  objectives).  These  objectives 
can  be  developed  in  several  ways.  They  can  be  set  by  the  rater,  suggested 
by  the  rated  officer  (which  may  be  appropriate  when  the  rater  is  new  or 
recently  assigned)  or  may  be  jointly  developed.  Furthermore,  this  task 
can  be  accomplished  formally  or  informally  depending  on  the  situation, 
style  or  personality  of  the  rater.  The  important  thing  is  that  the 
discussion  takes  place  and  the  rated  officer  gets  started  in  the  proper 
direction. 

During  the  rating  period,  the  rater  and  rated  officer  should  update 
the  duty  description  and  performance  objectives  to  reflect  shifts  in 
emphasis,  additional  missions,  and  other  changes  that  are  likely  to  occur. 
This  updating  process  presents  the  rater  with  an  ideal  opportunity  to 
coach  or  counsel  the  rated  officer  and  provide  additional  guidance  as 
well  as  the  benefit  of  his  knowledge  and  experience.  Concurrently,  the 
rated  officer  is  afforded  an  opportunity  to  provide  relevant  comments, 
perceptions  and  suggestions  concerning  his  performance  and  future  direc¬ 
tion. 


At  the  end  of  the  rating  period,  the  rated  officer  receives  a  Support 
Form  with  identification  data  filled  out  to  include  the  names  of  the 
senior  rater  and  the  intermediate  rater  if  one  has  been  included  in  the 
rating  chain.  The  rated  officer  is  required  to  complete  his  duty  descrip¬ 
tion,  major  performance  objectives,  and  significant  contributions — and 
submit  the  Support  Form  to  his  rating  chain.  If  the  rated  officer  has 
kept  his  ir*oraation  updated  and  has  continued  to  communicate  with  his 
rater,  he  is  in  the  best  position  to  provide  objective  comments  concerning 
his  performance  and  avoid  unrealistic  remarks. 

The  Support  Form  institutionalizes  a  procedure  that  has  been  a  part 
of  effective  military  leadership  for  generations  -  tL  -.t  of  clarifying 
performance  expectations  for  subordinates  and,  in  the  process,  assisting 
the  subordinate  in  his  professional  development.  Properly  used,  the  Support 
Form  is  beneficial  to  the  rated  officer  and  rating  officials  not  only  in 
attaining  a  more  complete  and  valid  evaluation,  but  also  in  improving 
performance . 

Senior  Rater  Concept 


One  of  the  fundamental  concepts  of  the  new  system  is  to  increase  the 
role  of  more  senior  officers  in  the  rating  chain.  The  intent  is  to  fix  a 
critical  evaluation  responsibility  on  the  individual  in  each  rating  chain 
who  is  at  least  one  level  removed  from  the  immediate  supervision  of  the 
rated  officer,  and  yet  close  enough  to  the  rated  officer  to  be  aware  of  the 
organizational  circumstances  surrounding  his  performance.  By  being  one 
level  removed  from  direct  supervision,  the  senior  rater  has  additional 
experience  to  judge  performance  from  a  broader  perspective,  has  a  wider 
range  of  officers  to  consider  and  compare,  and  is  more  likely  to  weigh 
organizational  requirements  and  actual  performance  results  more  heavily 


than  personal  relationships  and  personality.  He  evaluates  the  potential 
of  the  rated  officer  in  comparison  with  a  sample  population  of  100  officers 
of  ti.e  same  grade.  This  is  accomplished  by  placing  an  X  in  the  appropriate 
box  in  the  left  hand  portion  of  the  distribution  (See  Figure  3).  While 
there  are  no  quotas  or  rigid  requirements  to  spread  the  rated  officers 
across  all  of  the  boxes,  logic  imposes  its  own  constraints  on  the  number 
of  officers  who  can  be  placed  in  any  particular  block.  For  example,  it 
is  extremely  unlikely  that  all  of.  the  officers  rated  by  a  senior  rater 
will  always  be  one  in  a  hundred.  Therefore,  the  senior  rater  who  consis¬ 
tently  places  all  his  officers  in  the  top  box  is  distorting  the  system 
and  is  simply  not  supplying  DA  with  credible  rating  information. 

When  the  OER  arrives  at  Department  of  the  Army  Military  Personnel 
Center  (MILPERCEN),  the  rating  history  or  profile  of  the  senior  rater 
is  placed  in  the  boxes  in  the  right  hand  portion  of  the  distribution. 

This  history  shows  exactly  how  the  senior  rater  placed  all  officers  he 
evaluated  of  the  same  grade  as  the  rated  officer  up  to  that  time.  For 
example,  a  senior  rater  evaluates  a  major  by  placing  him  in  the  second 

box  on  the  left.  When  the  report  is  accepted  by  Department  of  the  Army 

MILPERCEN  as  correct,  the  number  of  majors  senior  rated  by  that  senior 
rater  up  to  and  including  that  report  will  be  placed  in  the  boxes  on  the 
right.  This  provides  selection  boards  with  a  comparison  of  the  senior 
rater's  general  tendency  and  how  he  rated  that  particular  officer,  there¬ 
by  addressing  that  age  old  problem  of  hard  versus  easy  raters  (See 
example  at  Figure  4).  In  addition,  this  profile  will  offer  some  protec¬ 
tion  to  officers  rated  early  in  the  new  system  in  the  event  of  general 

inflation  later  on  in  the  life  of  the  system.  Even  after  several  years, 
a  selection  board  will  still  be  able  to  see  exactly  how  the  senior  rater 
was  evaluating  officers  the  day  he  rated  the  individual  officer.  After 
evaluating  potential,  the  senior  rater  makes  any  appropriate  comments  on 
performance,  potential  or  anything  associated  with  his  review  of  the 
entire  form.  At  this  point  the  senior  rater  forwards  the  CER  through  the 
servicing  Military  Personnel  Office  to  Department  of  the  Army  MILPERCEN 
and  returns  the  Support  Form  to  the  rated  officer. 

The  DA  Form  67-b-2 ,  Senior  Rater  Profile  Report  (Figure  5)  is  used 
by  Department  of  the  Army  to  track  and  maintain  a  record  of  the  rating 
history  of  the  senior  rater.  This  rating  history  is  expressed  in  terms 
of  the  number  of  reports  rendered  and  the  number  of  different  officers 
evaluated.  It  is  produced  annually  for  all  senior  raters  who  have  senior 
rated  at  least  five  different  officers.  One  copy  of  the  report  is  made 
available  to  the  senior  rater  and  one  copy  is  placed  in  the  senior  rater's 
Official  Military  Personnel  File. 

The  purpose  of  the  senior  rater  profile  report  is  to  remind  all  senior 
raters  of  their  responsibility  to  supply  credible  evaluative  information  to 
Department  of  the  Army.  This  is  one  of  a  senior  rater's  most  important 
responsibilities  because  of  its  impact  on  the  future  leadership  of  the  Army. 
Therefore,  because  it  is  an  important  responsibility,  it  is  an  element  of 
performance  and  is  placed  in  the  senior  rater's  performance  fiche  next  to 
his  other  performance  documents.  Thus,  the  senior  rater  profile  report  is 
an  indication  to  selection  boards  as  to  the  degree  to  which  a  senior  rater 
accepts  his  evaluation  responsibilities. 
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Discussion 


The  senior  rater  concept  was  instituted  to  help  dampen  inflation  by 
taking  the  pressure  off  the  subordinate's  immediate  supervisor  and  giving 
the  critical  potential  evaluation  to  the  more  senior  individual  in  the 
rating  chain  who  has  a  broader  organizational  perspective  and  theoretically 
a  more  objective  basis  for  evaluation.  Some  common  causes  of  inflated 
ratings  -  loyalty  to  the  individual  subordinate,  team  cohesion,  perceived 
superiority  of  subordinates,  the  perception  of  high  levels  of  responsibility 
and  communication  difficulties  -  affect  the  senior  rater  to  a  lesser  degree 
than  the  immediate  supervisor;  however,  the  Army's  up  or  out  promotion 
policy  still  remains  a  pervasive  pressure  even  for  the  senior  rater.  For 
the  first  time  in  history  some  type  of  control  is  maintained  over  ratings 
given  by  senior  raters.  Selection  boards  are  able  to  see  how  senior  raters 
shoulder  and  accept  their  rating  responsibilities  via  the  Senior  Rater 
Profile  Report  which  is  generated  annually  for  senior  raters.  Although  the 
"new"  OER  System  has  been  in  effect  for  almost  two  years  with  over  200,000 
reports  received,  it  is  still  too  early  to  make  a  conclusive  assessment  of 
how  well  it  is  accomplishing  the  objectives  for  which  it  was  designed;  how¬ 
ever,  preliminary  feedback  from  the  field  and  selection  boards  is  very 
positive.  The  use  of  the  OER  Support  Form  appears  to  be  making  a  signifi¬ 
cant  contribution  not  only  to  the  evaluation  process,  but  also  the  goal 
of  better  performance.  Field  reaction  has  been  extremely  positive,  with 
several  benefits  consistently  reported:  increased  awareness  by  rated 
officers  of  their  responsibilities,  a  closer  alignment  of  performance  to 
organizational  missions,  an  opportunity  for  rated  officers  to  remind  rating 
officials  of  what  was  accomplished  during  the  rating  period,  and  the 
availability  of  specific  information  to  rating  officials,  making  prepara¬ 
tion  of  the  OER  easier  than  it  has  been  in  the  past- 

Feedback  from  selection  boards  to  date  (AO  boards,  more  than  3C0  members) 
has  provided  insight  into  both  positive  and  problematic  areas  of  the  new 
system.  Responses  to  a  selection  board  questionnaire  indicate  a  healthy 
balance  exists  between  the  rater  and  senior  rater  portions  of  the  OER 
reflecting  the  importance  of  both  to  the  evaluation  process.  (See  Figure  6 
and  7  for  complete  OER,  DA  Form  67-8).  There  does  not  appear  to  be  undue 
focus  on  the  senior  rater  portion  of  the  OER.  Early  fears  from  the  field 
that  a  top  block  evaluation  was  needed  for  promotion  have  been  dispelled 
by  selection  board  follow-up  studies.  In  almost  a  full  year's  cycle  of 
selection  board  deliberations,  more  than  half  of  the  selectees  had  less  than 
top  box  evaluations  with  a  range  of  1-5  box  ratings,  even  when  selection 
races  were  as  low  as  52. 

Senior  raters  Army-wide  are  shouldering  their  responsibilities  very  well. 
The  vast  majority  appear  to  be  spreading  their  effective,  successful  officers 
over  at  least  the  top  four  boxes.  However,  it  must  be  remembered  that  the 
worth  of  a  senior  rater  evaluation  is  not  based  on  what  all  senior  raters 
throughout  the  Army  did,  but  rather,  on  the  comparison  of  the  box  check  with 
the  individual  senior  rater's  general  rating  tendency  or  profile,  as  amplified 
and  explained  by  his  remarks  concerning  the  rated  officer. 
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DATA  REQUIRED  0Y  THE  PRIVACY  ACT  OF  1974  <5  U.S.C  552*1 

1.  AUTHORITY:  Sec  301  Till*  5  USC.  Sec  301 2  Title  1 0  USC 

2.  PURPOSE:  DA  Form  67  — 6,  Officer  Evaluation  Report,  lervrtu  the  primary  tourer  of  information  for  officer  ptnortMl 
management  decisions  DA  Form  67— B— ] .  Officer  Evaluation  Support  Form,  serve*  a&  »  guide  for  the  rated  officer's  per  form  - 
aner.  development  of  the  rated  officer,  enhance!  the  accomplishment  of  the  organization  Mission.  and  provide*  additions] 
performance  information  to  the  rating  chain 

3.  ROUTINE  USE:  DA  Form  67-3  will  be  maintained  in  the  rated  officer  *  official  military  Personnel  File  (CMPF)  and 
Career  Management  Individual  File  iCMlF)  A  copy  will  be  provided  to  the  rated  officer  either  directly  or  aent  to  the 
forwarding  address  shown  in  Part  I.  CA  Form  6?— 8  DA  Form  67—8  —  1  t$  fo*-  organizational  use  only  and  will  be  returned  to 
the  rated  officer  after  review  by  the  rating  chain 

4.  DISCLOSURE:  Disclosure  of  the  rated  officer's  SSaN  (Part  1.  DA  Form  67—6  )  ta  voluntary  However,  failure  to  verify 
the  SSAN  may  result  m  a  delayed  or  erroneous  processing  of  the  officer'*  OER  Diacloaure  y  the  information  in  Pan  IIIc, 

DA  Form  67—6—1  a  voluntary  However,  failure  to  provide  the  information  requested  will  result  in  an  evaluation  of  the 
rated  officer  without  the  benefits  of  that  officer  »  comments  Should  the  rated  officer  js*  the  Privacy  Act  as  a  basu  not 

to  provide  the  information  requested  in  Part  Ilk.  the  Support  Form  will  contain  the  rated  officer's  statement  to  that  effect 
and  he  forwarded  through  the  rating  chain  in  accordance  with  AR  623  —  105 


INSTRUCTIONS 

PART  I:  Identification  —  Self  explanatory 

PART  II:  Rating  Chain  —  TYie  personnel  officer  or  appropriate  administrative  office  will  fill  in  information  based  on 
the  commander'*  designated  rating  scheme 

PART  Ilia  Rated  Officer  Significant  Duties  and  Responsibilities  -  State  the  norma!  requirements  met  in  your  specific 
position  as  well  as  any  important  additional  duties  Address  the  type  of  work  required,  rather  than  frequently  changing 
specific  tasks 

PAR  T  flit):  Rated  Officer  Major  Performance  Objectives  -  List  the  moat  important  tasks,  priorities,  and  major  area*  of 
concern  and  responsibility  assigned  Thvt  is  an  explanation  of  how  you  set  out  to  accomplish  the  duties  described  in  Ula 
Ideally  these  are  planned  goals  thvt  you  will  work  toward  in  an  effort  to  make  a  contribution  to  the  accomplish  mam  of  the 
organization  mission ,  however,  they  may  be  in  reaction  to  unpredictable  changes  The  objectives  come  ft  >m  the  following 
four  categories 


ROUTINE  —  Objective*  that  address  the  repetitive  S' 
These  are  dut»e*  that  will  product  lea*  visible  results 
properly  ex  rev  led 

PROBLEM  SOLVING  —  Objectives  that  provide 
should  plan  for  or  address  potential  problems  so  tr.. 
disrupting  other  objective* 


mmon place  duties  that  must  be  carried  out 
will  have  serious  consequences  if  not 

sling  with  problem  situations  The  objective 
't  >t  available  to  deal  with  them  without 

■<ods  of  operation  m  the  organization 


INNOVATIVE  -  Objective*  mat  create  new  or  improved  m< 


PERSONAL  DEVELOPMENT  —  Objectives  that  further  professional  growth  of  an  individual  or 
his ‘her  subordinate* 

PART  IIIc  Rated  Officer  Significant  Contributions  -  Describe  the  most  significant  contributions  you  mad*  during  the 
rating  period  These  may  have  beer,  in  support  of  the  objectives  established  or  may  highlight  other  accomplishments 
that  you  feel  are  important 

PART  IV  Raier  and/or  intermediate  Rater  Review  and  Comment  —  Insure  any  remarks  are  consistent  with  your 
performance  and  potential  evaluation  on  DA  Form  67-6  Signature  does  not  show  concurrence  with  Pan  III 
but  indicate*  mat  you  h**e  reviewed  the  rated  officer  *  portion  of  the  form 


FIGURF  2.  REVERSE  SIDE  OF  OER  SUPPORT  FORM 
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CPT  Bridges  is  one  of  the  most  effective 
Battery  Commanders  in  the  Brigade.  His 
battery  is  frequently  singled  out  for  its 
outstanding  performance  in  training,  on 
field  exercises,  and  during  maintenance 
inspections.  He  should  be  promoted  as  soon 
as  possible. 


FIGURE  3.  SENIOR  RATER  POTENTIAL  EVALUATION 
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CPT  Bridges  is  one  of  the  roost  effective 
Battery  Commanders  in  the  Brigade.  His 
battery  is  frequently  singled  out  for  its 
outstanding  performance  in  training,  on 
field  exercises,  and  during  maintenance 
inspections.  He  should  be  promoted  as  soon 
j  as  possible. 
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FIGURE  4.  SENIOR  RATER  POTENTIAL  EVALUATION 
WITH  DA  GENERATED  LABEL  AFFIXED 
DEPICTING  CUMULATIVE  SENIOR  RATER 
PROFILE  FOR  CPTs. 
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FIGURE  5.  EXAMPLE  OF  ANNUAL  SENIOR  RATER  PROFILE  REPORT 
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FIGURE  7.  REVERSE  SIDE  OF  OFFICER  EVALUATION  REPORT 
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NROTC  Joins  the  Bin  Test  Movement 
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ANDREW  N.  DO'.v,  Ed.  D 
Director  of  Naval  Science  Evaluation 
CNET,  Pensacola,  Florida 


SUMMARY 


Traditionally,  CNET  has  allowed  the  local  NROTC  units  autonomy  in 
matters  of  testing  and  grading.  The  high  level  of  students  and 
institutions  assured  adequate  quality.  Recently,  a  few  NROTC 
graduates  were  having  difficulty  completing  the  post-commissioning 
schools,  especially  Surface  Warfare  Officers’  School  (SWOS).  If  NROTC 
is  to  anticipate  who  will  have  SWOS  problems,  a  comprehensive 
instrument  to  evaluate  each  midshipman's  retention  and  comprehension 
of  the  first  three  years  work  is  needed. 

T^»e  exam  was  developed  using  the  philosophy  and  techniques  that  have 
evolved  with  the  Navy  enlisted  advancement  examination  program. 

Because  the  use  of  the  exam  and  the  population  to  be  tested  are 
different  from  that  of  the  advancement  program,  a  number  of  changes 
became  mandatory.  The  three  hour  exam  has  150  items  in  five  subject 
matter  areas.  Total  standard  scores  and  section  stanines  were 
calculated;  item  analysis  data  were  extracted.  Persons  who  fell  into 
the  lower  stanine  on  any  section  were  marked  for  remediation.  Units 
tailored  their  remedial  work  to  those  that  needed  it.  No  unit 
statistics  were  computed  as  there  is  no  desire  to  set  up  inter-unit 
rivalries. 

For  the  1981  exam  and  future  exams,  new  software  developed  by  TAEG  of 
Orlando  will  pinpoint  specific  difficulties  and  calculate  section 
standard  scores  in  addition  to  providing  the  data  produced  in  1980. 
This  will  enable  individuals  to  zero-ln  on  their  deficiencies. 
Semi-automated  production  is  another  goal.  .. 
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The  Naval  Reserve  Officers'  Training  Corps  (NROTC)  consists  of 
fifty  five  (55)  units  located  on  the  campuses  of  major  universities 
and  colleges.  At  some  of  these  units,  students  from  non-host  institu¬ 
tions  are  "cross-town"  enrolled;  thus  the  midshipmen  of  NROTC  are  pur¬ 
suing  degrees  at  approximately  100  colleges  and  universities.  The 
Naval  Science  training  is  spread  over  the  four  years  of  the  under¬ 
graduate  career;  it  consists  of  eight  courses  and  three  summer 
cruises.  Each  course  is  based  upon  a  Chief  of  Naval  Education  and 
Training  (CNET)  approved  curriculum,  and  is  supervised  by  CNET  with 
the  assistance  of  a  course-coordinating  NROTC  unit.  One  officer 
instructor  attached  to  the  course-coordinating  unit  is  designated  as 
the  point-of-contact  which  makes  him  the  active  course  coordinator. 
Specific  details  of  course  presentation,  testing,  and  grading  are 
local  matters  under  the  purview  of  the  unit  commanding  officer  who  is 
the  university's  Professor  of  Naval  Science. 

The  four  year  Navy-option  program,  which  includes  some  special 
requirements  outside  the  department  of  Naval  Science,  leads  to  a  com¬ 
mission  as  ENSIGN.  Midshipmen  who  are  enrolled  as  scholarship  stu¬ 
dents  are  commissioned  into  the  regular  Navy;  non-scholarship  (college 
program)  students  are  commissioned  into  the  Naval  Reserve.  Scholar¬ 
ship  students  receive  $100  per  month  plus  books  and  tuition.  Newly 
commissioned  naval  officers,  regardless  of  source,  generally  attend  a 
post-accession  school — flight,  supply  corps,  submarine,  surface  war¬ 
fare,  etc.  There  is  also  a  Marine  option  with  some  different 
courses;  this  leads  to  a  commission  as  a  second  lieutenant  in  the 
Marine  Corps  or  the  Marine  reserve. 

Several  years  ago,  it  was  observed  that  a  higher  percentage  of 
NROTC  graduates  were  being  attrited  from  the  Surface  Warfare  Officers' 
School  Basic  Course,  than  were  officers  from  other  sources  (U.S.  Naval 
Academy  and  OCS).  The  Naval  Personnel  Research  and  Development 
Center,  San  Diego  was  tasked  with  investigating  this  matter  and  repor¬ 
ting  their  findings.  This  report  (Crawford)  pointed  the  finger  at  the 
NROTC  program  on  a  superficial  level,  but  they  had  not  Investigated 
causes  nor  specific  sources  of  the  problem.  Informal  tallies  of  the 
specific  NROTC  units  of  the  attriting  officers  tended  to  show  that 
some  campuses  were  consistent  sources  of  attrited  officers,  and  that 
there  were  a  few  that  were  intermittent  sources  of  attrites  from  the 
Surface  Warfare  Officers'  School.  Independently,  the  Navy  Inspector- 
General's  1978  inspection  team  suggested  that  CNET  develop  end-of- 
course  tests  for  all  Naval  Science  courses.  CNET  chose  Instead  to 
develop  a  corps-wide  comprehensive  examination  to  be  administered  to 
all  first-class  Navy  option  midshipmen  (seniors)  early  In  the  academic 
year.  The  dual  purpose  of  this  examination  is  to  encourage  quality 
Instruction  and  to  diagnose  deficiencies  so  that  they  may  be  remedied 
during  the  final  year;  It  is  not  a  hurdle  to  be  cleared  before 
commissioning.  Headquarters  is  not  usurping  the  right  of  the  local 
unit  commanding  officer  to  decide  who  should  be  commissioned  and  who 
not.  This  exam,  hopefully,  will  assist  the  staffs  of  the  local  units 
in  identifying  those  midshipmen  that  have  forgotten  some  of  the 
crucial  material  they  previously  learned.  While  there  are  some 
questions  that  check  on  direct,  rote  knowledge  of  nomenclature,  the 
whole  exam  is  slanted  toward  the  broader  aspects  of  information  and 


the  application  of  interacting  items  of  knowledge — it  is  aimed  at 
macro-objectives  rather  than  micro-objectives. 

The  broad  philosophy  of  the  examination  is  similar  to  that  of  the 
successful  examinations  used  in  the  Enlisted  Advancement  prog .am. 

Both  the  NROTC  and  the  Navy  Enlisted  Advancement  program  assume  that 
everyone  who  takes  the  exam  is  qualified  to  undertake  the  next  pay- 
grade  up  the  ladder.  The  purpose  of  both  examination  programs  is  to 
find  out  who  is  better  qualified;  at  this  point,  the  two  programs  go 
their  own  ways.  NROTC  tutors  those  who  need  it;  the  Advancement 
system  sets  the  less  qualified  aside  to  mature  and  study  on  their  own 
so  that  they  may  do  better  next  time.  Both  programs  use  examinations 
that  are  designed  to  distinguish  between  Individuals  over  a  wide  range 
of  performance.  In  other  words,  these  exams  spread  the  members  of  the 
upper  end  lower  quarters  with  approximately  equal  validity. 

Strictly  speaking,  the  NROTC  COMPrehensive  could  have  been  cast  as 
a  mastery  type  exam.  But,  there  are  a  number  of  reasons  why  a 
mastery-type  exam  would  be  less  than  satisfactory  for  this  purpose. 
First,  a  realistic  mastery-type  exam  that  includes  only  materials 
which  MUST  be  mastered  will  appear  as  an  easy  waste  of  time  to  the  top 
three-quarters  of  an  elite  population  like  the  seniors  of  the  NROTC 
midshipmen,  and  those  who  do  poorly  are  labeled  "dummies."  Second, 
using  a  truncated  measure  such  as  a  mastery  test  is  analogous  to 
measuring  the  height  of  American  men  with  a  ruler  that  is  65  inches 
long;  the  distribution  of  the  scores  will  be  similarly  distributed  in 
both  of  these  cases.  Of  course,  there  may  be  a  situation  in  which  a 
specific  project  would  need  to  know  how  many  men  fell  into  each  inch 
category  of  height  up  to  and  including  65  Inches,  and  hvw  many  were 
over  that  figure.  The  current  needs  of  the  NROTC  program  are  such 
that  a  truncated  Information  exam  might  be  satisfactory  from  some 
viewpoints,  but  it  would  leave  many  questions  unanswered.  Third, 
mastery-type  exams  tend  to  dwell  upon  specific  minutiae,  or  the 
microobjectiveB  of  an  instructional  program  or  of  a  set  of  specific 
"competencies."  While  the  specific  micro-objectives  of  an 
instructional  program  are  the  necessary  steps  in  the  development  of  an 
understanding  of  the  macro-objective  of  a  series  of  lessons,  it  is  not 
necessary  that  the  person  who  understands  and  comprehends  the 
macro-objective  be  able  to  recall  specific  bits  of  micro-objective 
related  information.  Fourth,  it  is  conceivable  that  the  NROTC 
COMPrehensive  Examination  might,  in  the  future,  serve  as  the  basis  for 
recognizing  those  midshipmen  who  have  achieved  a  high  degree  of 
comprehension  of  the  professional  content  of  the  officer  training 
program.  The  truncated  measure  is  completely  useless  for  this 
purpose.  Fifth,  there  are  grounds  to  argue  that  a  mastery-type  exam 
yields  a  score  that  is  a  tabulation  of  the  number  of  "digital" 
successes  achieved,  and  is  not  a  measure  of  a  continuous  phenomenon. 
The  conventional  objective  examination,  sometimes  referred  to  as 
"norm-referenced",  yields  scores  that  fall  along  a  true  continuum. 
Sixth,  most  consumers  of  test  scores  have  some  understanding  of,  and 
expect,  statistical  analyses  that  utilize  the  parametric  statistics 
developed  by  Galton  and  Pearson.  These  statistics  are  meaningless 
unless  based  upon  normally  distributed  continuous  data  (Treloar). 
Meaningful  statistics  cannot  be  derived  from  the  non-normally 
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successes 


distributed  discontinuous  data  that  tallies  the  specific 
achieved  on  a  mastery  typ_>  test. 

Tlie  content  o:  the  NROTC  O'MPreheas i ve  exam  Is  based  both  on  the 
curriculum  and  upon  the.  broad  competencies  expected  of  a  junior  naval 
officer.  its  score  conversion  is  based  upon  the  performance  of  a 
population  of  known  ability  that  has  been  instructed  according  to  a 
standard  set  of  curricula.  Thus  the  NROTC  COMPrehens i ve  Is  curri- 
culum-and-competency  based  and  population  referenced.  It  is  not 
norm-re ferencd,  NROTC  is  not  developing  a  set  of  norms  against  which 
to  compare  others.  The  population  sets  the  performance  level,  thus  it 
Is  not  criterion  referenced.  It  does  yield  continuous  data  that 
distribute  themselves  rather  normally  along  a  continuum.  Therefore, 
valid  standard  deviations,  standard  scores,  correlations,  etc.,  can  be 
computed,  and  used  meaningfully. 

The  techniques  used  to  develop  the  exams  for  the  Naval  Enlisted 
Advancement  system  were  adapted  to  the  special  needs  of  the  NROTC 
examination.  The  advancement  system  uses  fleet-experienced  senior 
petty  officers  to  produce  the  raw  materials  from  which  its  highly 
successful  examinations  are  built.  NROTC,  analogously,  uses  line 
officers  with  recent  experience  in  the  role  of  00D,  in  engineer  duty, 
and  in  the  weapons  field  to  write  questions  for  the  pool  from  which 
the  exams  are  constructed.  In  contrast  to  the  petty  officers  who  are 
assigned  to  examination  work  for  a  tour  of  two  or  more  years,  the  test 
writing  is  a  special  duty  of  the  line  officers  who  Instruct  in  the 
NROTC  units.  Periodically,  these  officer  instructors  are  invited  to 
prepare  and  submit  one  (more  if  they  choose)  test  question  for  each 
lesson  in  the  courses  they  are  responsible  for.  There  is  no  point  in 
delving  inti  the  actual  nitty-gritty  of  building  and  publishing  the 
exams,  suffice  it  to  say  that  NROTC's  procedures  are  derived  from 
those  used  with  the  advancement  system.  In  the  semi-final  stages,  the 
exam  as  a  whole  is  reviewed  and  the  answers  checked  by  several  line 
officers. 

Specifically,  why  is  this  a  BIG  examination?  First,  it  is  big 
because  of  its  scope — it  covers  the  high  points  of  three  years  of 
Naval  Science  courses.  These  courses  are  the  introduction  to  the  Navy 
and  its  traditions,  Naval  Ships'  Engineering  Systems,  Naval  Ships' 
Weapons  Systems,  Navigation,  Ship  Operations,  and  Seapower  and  Naval 
History.  Secondly,  it  is  a  three  hour  exam  with  150  multiple  choice 
questions.  Lastly,  this  is  a  BIG  examination  because  it  is  used 
nationwide. 

The  examination  consists  of  five  sections  with  thirty  questions  in 
each;  the  Introduction  to  the  Navy  and  the  History  of  Sea  Power  are 
presently  merged  into  a  single  section.  Each  of  the  other  four 
courses  is  examined  by  a  fill  section  of  thirty  items.  Whenever 
practical,  questions  are  problem  oriented.  In  the  section  devoted  to 
ship  operations  there  are  several  problems  to  be  worked  out  on  paper 
maneuvering  boards  (relative  motion  plots).  At  least  two,  and  in  some 
iisi'f  as  many  as  four,  questions  are  asked  about  the  results  of  each 
maneuvering  hoard  problem.  In  the  1930  examination  each  section  was  a 
discrete  block  of  questions;  but  in  the  1931  examination  each  section 


is  spread  from  the  beginning  to  the  end  in  spiral  fashion.  This  is 
part  of  the  attempt  to  make  the  NROTC  examination  as  realistic  as 
possible — in  real  life,  problems  do  not  appear  in  neat  well-classified 
packages.  Improved  software  makes  it  possible  to  derive  aection 
scores  and  section  related  item  analysis  data  from  spiraled  sections. 

The  1980  NROTC  examination  papers  were  processed  using  the 
hardware  and  software  used  for  some  of  the  examinations  in  the  enlis¬ 
ted  advancement  system.  The  system  yielded  a  total  raw  score,  section 
raw  scores,  a  Navy  standard  score  conversion  for  the  total  score,  and 
section  stanines  for  each  participant.  A  graphic  frequency  distri¬ 
bution  portrayed  the  overall  performance  of  all  participants;  it  was 
augmented  by  parametric  descriptive  statistics.  The  system  also 
yielded  item  analysis  data  that  included  the  total  response  to  each 
alternative  response  to  the  question,  the  overall  difficulty  of  each 
Item,  and  each  item’s  intra-section  discriminatory  power.  Score  lists 
were  prepared  for  each  individual  NROTC  unit  listing  the  score  data 
for  each  of  the  midshipmen  attached  to  the  specific  unit.  Any  parti¬ 
cipant  that  scored  in  the  lowest  stanine  on  any  section  was  tagged  for 
remediation  in  the  specific  field.  This  cut-off  point  approximates 
the  Navy's  time-honored  passing  score  of  -1.5  sigmas  (standard  score 
of  35).  Unit  averages  and  other  unit-specific  data  were  not  computed; 
the  purpose  of  the  exam  is  the  improvement  of  the  individual  rather 
than  inter-unit  competition. 

The  Chief  of  Naval  Education  and  Training  (CNET)  charged  each 
local  commanding  officer  (Professor  of  Naval  Science)  with  the 
responsibility  of  devising  and  implementing  a  program  fitted  to  the 
needs  of  the  persons  in  his  command  that  needed  remediation.  CNET 
also  requested  that  the  nature  and  results  of  the  remedial  program 
be  reported.  As  little  guidance  came  from  the  command,  there  was  a 
wide  variety  of  programs.  These  programs  cannot  be  evaluated  at 
present  because  the  participants  are  members  of  the  class  of  '81,  and 
their  success  as  Naval  officers  and  post-accession  trainees  Is 
unknown. 

The  ly81  NROTC  COMPrehensive  Examination  will  be  processed  using 
software  developed  by  the  Training  Analysis  and  Evaluation  Group  at 
Orlando  (TAEG).  In  addition  to  the  data  supplied  by  the  advancement 
system  software,  the  new  software  will  produce  a  list  of  Individuals 
(by  unit)  that  shows  which  items  each  missed.  This  will  reduce  the 
time  and  effort  expended  in  remedial  worn,  and  should  make  that  which 
is  done  more  effective.  There  is  no  reason  for  a  midshipman  to  repeat 
the  entire  semester's  course  when  he  is  weak  in  about  a  quarter  of  the 
course . 

As  the  bank  of  useful  examination  questions  grows,  the  process  of 
putting  them  into  a  magnetically  recorded  bank  will  begin.  Materials 
from  such  a  bank  could  be  used  to  produce  the  exams  in  a  semi-auto¬ 
mated  fashion.  The  person  who  puts  the  new  exam  together  will  be  able 
to  call  for  print-outs  of  items  in  a  specific  area,  then  select  and/or 
revise  them.  After  the  tentative  selection  has  been  made,  the  word 
processor  will  print  out  the  selected  revised  items.  After  some 
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hand-massaging  of  the  tentative  exam,  the  word  processor  will  produce 
the  originals  of  the  final  product. 

Another  proposed  future  development  lies  In  the  area  of  switching 
from  "content-oriented"  questions  to  "process-oriented  problems." 

Some  of  the  present  questions  are  more  concerned  with  "how"  than  with 
"what"  or  "why,"  but  in  the  future,  more  of  them  will  be  “how"  items 
supplemented  with  some  essential  "why"  items.  There  may  even  be  room 
for  a  few  "what"  items.  The  general  validity  of  the  process-oriented 
question,  the  "how"  item,  is  being  established  by  the  College  Outcomes 
Measurement  Project  of  the  American  College  Testing  Program  (COMP). 

As  for  the  effectiveness  of  NROTC's  BIG  examination— as  stated 
earlier,  it  is  too  new  to  evaluate.  The  Internal  descriptive  statis¬ 
tics  are  acceptable  for  a  new  untried  test.  TAEG  will  perform  a 
predictive  validity  study  when  sufficient  data  and  criteria  are  avail¬ 
able.  This  and  other  studies  will  be  reported  in  the  future  trtien  they 
are  available. 
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Most  occupations  involve  a  nuaber  of  work  characteristics  and  aptitudes. 
Present  AQE/aSVAB  categories  or  "areas  of  enlistment"  do  not  fully  describe 
the  nature  of  the  work  being  performed  within  Air  Force  occupational  special¬ 
ties.  Selection  of  personnel  on  the  basis  of  AQE/ASVAB  categories  frequently 
leads,  as  a  consequence,  to  personnel  expecting  work  to  be  predominantly  a 
particular  kind  when,  in  fact,  other  work  requirements  are  paramount.  Because 
of  this  problem,  USAFOMC  developed  a  task-by-task  look  at  USAF  occupations 
using  a  benchmark  scale  to  categorize  the  nature  of  work  being  performed 
within  each  Air  Force  specialty.  Teams  of  subject  matter  experts  were  pre¬ 
sented  lists  of  commonly  performed  tasks  and  assisted  occupational  analysts 
in  categorizing  the  tasks  according  to  the  benchmark  scales.  Most  occupations 
were  found  to  consist  of  tasks  fitting  a  predominant  ^category,  wi$h  several 
other  categories  present  as  well.  Some  previously  "Electronics"'  specialties, 
for  example,  were  found  to  be  primarily  “Electrical,"  with  a  high  component 
of  mechanical  skills.  This  paper  presents  the  rationale,  methods,  and  pre¬ 
liminary  results  of  the  USAFOMC  effort.^ 
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inat  the  aptitude  indices  of  the  Arsed  Services  Vocational  Aptitude 
Battery  (ASVAB)  are  not  perfectly  descriptive  of  the  nature  of  the  occu¬ 
pations  for  which  they  are  used  for  selection  is  not  a  startling-  revelation. 

That,  because  of  long  custom,  they  hv'e  taken  cn  a  far  greater  meaning 
than  they  are  entitled  is  nc  surprise.  But,  as  a  res  lit  of  the  broad  and 
indefinite  meaning  attached  to  aptitude  scores,  several  problems,  as  well  as 
misunderstanding,  in  the  recruitment,  ciassifi-_.„ion,  and  utilization  of 
personnel  arise.  One  effort  t.c  provide  more  insight  into  occupations  is  a 
study  Initiated  by  the  Air  Force  Occupational  Analysis  Program  to  describe 
the  components  of  the  enlisted  specialties.  The  study  involves  a  task-based, 
benchmark  methodology. 

Since  19^8  (Massey  and  Creagor,  1956)  the  Air  Force  has  used  some  form 
of  classification  test  as  a  selection  device.  These  classification  tests  are 
validated  against  school  success  (Mullins,  et  al,  1981),  a  practice  justified 
at  least  by  the  absence  of  stable  and  relevant  performance  criteria.  In 
addition,  one  might  speculate  about  the  value  of  predictive  or  concurrent 
validation  efforts,  because  of  the  effects  of  experience. 

Even  when  validated  against  school  success,  the  relationship  of  aptitude 
is  unclear.  The  General  Aptitude  Index,  for  example,  often  is  as  efficient  a 
predictor  of  success  in  training  as  another  index  that  is  in  practice  used  to 
select  for  an  Air  Force  specialty  (Vitola,  et  al,  1973)-  The  bottom  line  is 
that  we  know  little,  but  make  many  assumptions,  about  aptitude  and  job 
performance. 

A  kind  of  magic  is  attributed  to  the  aptitude  indices.  Operators,  for 
example,  contend  that  they  need  personnel  with  a  higher  aptitude  than  the 
minimum  used  in  the  classification  of  a  specialty.  No  one  can  argue  suc¬ 
cessfully  against  the  desirability  of  higher  aptitude  personnel  for  most 
specialties,  since  there  is  a  dearth  of  information  about  the  relationship  of 
aptitude  to  job  performance.  On  a  more  specific  level,  however,  the  operators 
may  need  more  experienced  personnel  rather  than  higher  aptitude  workers.  The 
problem  here  may  be  one  of  articulation  of  needs. 

The  lack  of  congruence  between  aptitude  indices  and  the  components  of 
the  jobs  for  which  they  are  used  to  select  also  creates  a  greater  problem. 
Personnel  are  counseled  during  recruiting  and  initial  classification  into  the 
Air  Force  on  the  kind  of  Air  Force  specialty  for  which  they  are  qualified. 

They  are  often  recruited  for  an  "electronic"  specialty  because  they  have 
acceptable  Electronic  Aptitude  Index  (EAI)  scores.  Sometimes,  however,  the 
"electronic"  specialty  is  not,  in  fact,  an  electronic  occupation  even  though 
the  EAI  is  the  selection  criterion.  Dissonance  results,  as  in  a  recent  case 
where  an  airman  with  an  EAI  score  of  95  was  assigned  to  a  missile  maintenance 
specialty  for  which  EAI  score  is  the  selection  criterion.  This  airman  believed 
he  was  getting  into  an  electronic  field,  but  he  observed  very  soon  that  in 
his  job,  he  encountered  nothing  electronic.  He  complained.  As  a  result,  we 
made  a  task-by-bask  analysis  of  the  specialty.  Not  only  did  we  find  that  he 
was  correct  aoout  his  job  but  that  no  electronic  task  —  or  a  task  requiring 
electronic  knowledge  —  existed  in  the  specialty. 

This  experience,  along  with  others,  led  the  Air  Force  Manpower  and 
Personnel  Center  Classification  Branch  to  request  the  U . S .  Air  Force  Occupational 
Analysis  Program  to  study  all  enlisted  specialties  to  identify  the  components 
of  each.  The  remainder  of  this  paper  describes  this  study. 


APPROACH 


The  initial  phase  of  the  project  involved  an  extensive  review  of  the 
literature  to  get  a  good  historical  perspective  of  the  aptitude  research 
conducted  by  the  USAF  Human  Resources  Laboratory  (AFKRL),  primarily  relating 
to  development  of  the  Airman  qualification  Examination  (AQE)  and  Armed  Servi. 

• ocst ion&l  Aptitude  Battery  (AS7AB)  ana  derivation  of  their  categories  or  sul- 
categories.  In  addition,  discussions  were  held  with  AFHRL  researchers  fanilia 
with  this  area.  Once  review  of  the  literature  was  completed,  definitions  were 
tentatively  derived  for  each  of  the  present  four  AQE/ASVAB  categories  -  Admin¬ 
istrative,  Electronic,  Mechanical,  and  General  -  and  subshills  or  components 
involved  were  related  to  each  area.  This  phase  of  the  project  turned  out  to 
be  much  more  difficult  than  at  first  imagined,  since  in  reviewing  the  lit¬ 
erature,  no  one  "definitive"  definition  of  the  four  areas  could  be  found,  and 
in  some  cases,  no  agreement  was  found  as  to  what  specific  components  or  sub¬ 
skills  comprised  the  four  aptitude  areas. 

Several  different  and  varied  specialties  were  then  selected  in  order  to 
test  the  approach  and  definitions.  It  became  quite  apparent  early  on  that  the 
four  aptitude  categories  were  too  broad  to  adequately  describe  the  components 
or  work  characteristics  of  a  job  or  specialty.  In  most  cases,  it  became  nec¬ 
essary  to  break  down  the  four  broad  categories  into  smaller,  more  meaningful 
categories. 

For  example,  in  the  ADMINISTRATIVE  area,  it  was  discovered  that  there  are 
at  least  three  types  of  administrative  tasks.  First,  there  are  those  which 
involve  clerical  work,  such  as  filing,  preparing  and  maintaining  forms  and 
publications,  and  answering  telephones.  Second,  there  are  tasks  which  deal 
with  some  form  of  mathematical  computations,  such  as  those  performed  in 
accounting  or  finance.  And  third,  there  are  those  tasks  which  involve  the 
use  of  office  equipment,  such  as  typewriters,  copy  machines,  or  stenographs. 
Thus  it  became  necessary  to  use  three  categories  in  this  area  rather  than  one. 

In  the  MECHANICAL  area,  it  was  found  that  not  all  mechanical  tasks  were 
of  equal  weight.  Some  tasks  were  simple  and  only  involved  the  use  of  such 
simple  or  common  tools  as  a  hammer  or  screwdriver.  Other  mechanical  tasks 
were  found  to  be  somewhat  more  complex  and  involved  spatial  reasoning  or 
advanced  knowledge  of  a  system  in  order  to  perform.  Thus,  seme  distinction 
was  made  as  to  relative  difficulty  of  these  tasks.  Also,  equipment  operation 
(other  than  office  equipment)  was  considered  a  mechanical  skill.  But  then 
again,  there  had  to  be  some  distinction  made  between  simple  equipment  oper¬ 
ation,  such  as  driving  cars  and  vans,  and  more  complex  equipment  operation, 
such  as  operating  cranes,  bulldozers,  or  aircraft  K-loaders.  Thus,  further 
breakdowns  were  essential. 

In  the  ELECTRONIC  area,  it  became  necessary  to  make  seme  distinction 
between  tasks  that  were  purely  electronic  and  those  that  were  purely  elec¬ 
trical,  since  there  is  a  difference  in  the  skills  and  knowledges  required 
to  perform  tasks  in  either  area.  Also,  in  this  area  as  well  as  in  the 
MECHANICAL  area,  some  distinction  was  made  to  differentiate  those  tasks 
that  involved  a  combination  of  skills  or  knowledges,  such  as  Mechanical- 
Electronic,  Electrical -Mechanical,  Electronic-Mechanical ,  and  Electronic- 
Electrical. 
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The  GENERAL  category  presented  some  problems  in  that  it  was  more  or  less 
a  "catch-all"  category  for  those  areas  which  were  not  involved  with  the  other 
three  categories.  Most  of  the  subcategories  listed  here  included  simple 
physical  labor,  medical  skills,  communicative  skills  (both  oral  and  written), 
general  procedures  or  techniques,  planning,  reasoning,  and  analyzing,  scien¬ 
tific  skills,  and  special  talents  (such  as  illustrating).  But  even  here,  it 
was  necessary  to  provide  some  further  delineations.  For  example,  in  the 
Medical  subcategory,  tasks  were  found  to  relate  to  either  patient  care  or 
patient  interaction,  medical  lab  equipment  operation,  or  medical  procedures 
conducted  in  a  medical  lab  or  operating  room.  Thus,  three  additional  sub¬ 
categories  were  listed. 

In  all,  26  subcategories  in  four  broad  areas  were  finally  listed.  For 
each  of  the  26  components  or  subcategories,  benchmark  tasks  were  listed. 

(For  a  ccmplete  list  of  the  categories  and  subcategories,  interested  per¬ 
sonnel  can  write  for  a  copy  from  the  USAF  Occupational  Analysis  Program 
(OMY),  USAF  Occupational  Measurement  Center,  Randolph  AFB,  Texas  78150. 
Attention:  Dr.  Driskiil). 

DATA  GATHERING 


Subject  matter  specialists  (SMSs)  TDY  to  the  USAF  Occupational  Measurement 
Center  to  write  Specialty  Knowledge  Tests  (SKTs)  were  used  in  categorizing  tasks 
from  the  various  specialties.  For  eacn  specialty  to  be  reviewed,  those  tasks 
which  comprised  50  percent  of  the  total  job  time  for  the  journeyman  (5-skill) 
level  were  selected.  This  was  an  arbitrary  percentage  which  was  felt  to  give 
a  good  representation  of  the  technical  performed  by  the  population  In 

a  given  specialty.  In  addition,  task  difficulty  data  routinely  collected  on 
each  specialty  were  used  to  help  categorize  those  tasks  where  difficulty  was 
a  factor. 

As  each  team  of  SMSs  were  used,  they  were  explained  the  categories  and 
shown  the  benchmark  tasks  for  each.  Definitions  were  carefully  explained  to 
them  in  detail  so  as  to  avoid  any  confusion.  As  each  team  went  through  the 
task  list^i  they  were  asked  to  explain  what  was  involved  in  performing  the 
tasks,  what  type  of  skills  or  knowledges  were  involved,  etc.  Occupational 
analysts  used  their  comments  In  deciding  the  category  which  best  fit  the 
established  benchmarks.  In  most  cases,  a  single  category  was  appropriate 
for  any  given  task.  In  some  cases,  however,  a  task  would  involve  multiple 
categories  or  components.  For  example,  in  the  Small  Arms  specialty,  many 
of  the  tasks  are  general  in  nature,  since  they  involve  instruction  on  the 
use  and  maintenance  of  weapons.  But  there  is  also  a  mechanical  element  to 
most  tasks,  since  many  of  the  steps  shown  in  the  instruction  involved 
performing  mechanical  operations.  When  this  happened,  the  task  was  placed 
in  both  categories. 

Once  all  tasks  were  categorized,  the  totals  for  each  category  were 
tallied  and  a  final  overall  category  (Administrative,  General,  Mechanical,  or 
Electronic)  was  listed  for  the  specialty,  .along  with  other  pertinent  findings 
or  components.  The  SMSs  were  asked  their  general  opinion  on  the  current 
AQE/aSVAB  category  and  on  the  categorization  results  for  their  AFSC.  Where 
there  were  differences  found  between  r,he  current  AQE/ASVAB  category  and  the 
USAFOMC  categorization,  they  were  asked  their  opinion  as  to  which  category 
they  believed  was  most  appropriate.  This  exercise  tended  to  further  validate 
the  results  cf  the  project. 
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CONCLUSIONS 


In  looking  at  some  87  specialties  to  date,  It  is  quite  evident  that 
a  project  of  this  nature  is  needed.  While  the  four  AQE/ASVAB  categories 
currently  in  use  are  sufficient  for  many  ladders,  they  are  not  sufficient 
for  describing  others.  The  nature  of  work  performed  today  in  some  enlisted 
specialties  has  changed  drastically  from  that  performed  years  ago  when  the 
original  AQE/ASVAB  categories  were  established.  As  USAFOMC  completes 
categorization  in  each  of  the  approximately  200  major  enlisted  specialties, 
the  information  is  being  turned  over  to  the  Air  Force  Manpower  and  Personnel 
Center's  classification  Branch.  Where  differences  exist  between  the 
current  ASVAB  classification  and  the  USAFCWC  categorization,  a  reexamination 
of  the  ASVAB  testing  category  may  be  required. 

By  using  a  task-based  approach  to  examine  the  types  of  work  being 
performed  in  specialties,  it  is  possible  to  provide  better  selection  and 
recruitment  information. 
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ABSTRACT 

This  new  interpersonal  skills  training  technology  uses  a  videodisc  player 
controlled  by  a  microcomputer.  The  videodisc  depicts  a  number  of  possible  in¬ 
teractions  between  a  new  Army  lieutenant  and  his  subordinate  which  might  occur 
when  the  lieutenant  attempts  to  solve  a  problem  such  as  deficient  subordinate 
performance.  The  leadership  trainee  is  first  presented  background  information 
related  to  the  problem.  The  trainee  then  sees  and  hears  the  subordinate's 
initial  comment  on  the  television  monitor.  A  menu  of  possible  responses  the 
student  might  make  follows  on  the  TV  screen  and  the  trainee  selects  the  one  he 
feels  is  best  by  pointing  to  it  with  a  light  pen.  The  computer  program  causes 
the  videodisc  player  to  move  to  the  point  on  the  videodisc  that  depicts  the 
way  the  subordinate  might  react  if  treated  in  that  manner.  The  subordinate's 
reaction  to  a  given  response  is  designed  to  provide  feedback  about  the  quality 
of  that  response.  One  mode  of  instruction  attempts  to  simulate  an  interpersonal 
interaction  as  closely  as  possible.  Another  adds  additional  feedback  about  the 
quality  of  each  response,  whether  it  is  the  best  response,  and  the  reason  it  is 
correct  or  incorrect.  Initial  reactions  of  individuals  reviewing  the  first  of 
eight  videodiscs  have  been  highly  positive.  An  experimental  evaluation  of  its 
training  and  assessment  potential  begins  in  November  1981,  ^ 

INTRODUCTION 


Background 

Previous  research  at  the  Army  Research  Institute  Field  Unit  at  Fort  Benning, 
Georgia,  showed  that  a  videodisc  system  could  successfully  train  soldier  technical 
skills  even  when  only  a  fraction  of  the  potential  of  the  videodisc  medium  was 
used  (Holmgren,  Dyer,  Hilligoss,  &  Heller,  1979).  The  current  research  and 
development  effort  at  the  field  unit  more  fully  exploits  videodisc  technology 
by  providing  simulations  of  leader-subordinate  interactions  for  realistic  train¬ 
ing  of  interpersonal  leadership  skills.  These  videodisc  scenarios  will  allow  new 
Army  leaders  to  practice  interactions  with  simulated  subordinates  in  situations 
which  now  are  frequently  mishandled  in  actual  Army  settings. 

This  videodisc  interpersonal  skills  training  and  assessment  (VISTA)  project 
was  initially  conceived  as  a  way  to  reduce  the  high  personnel  costs  associated 
with  use  of  assessment  centers  for  assessing  and  developing  leadership  skills. 


This  paper  was  also  presented  at  the  American  Psychological  Association 
Convention  and  the  Society  for  Applied  Learning  Technology  Convention. 
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The  problem  was  one  of  simulating  human  beings  in  the  many  different  ways 
that  they  might  respond  in  a  leadership  interaction.  An  audiovisual  medium 
was  needed  that  would  allow  rapid  accurate  random  access  to  a  large  number 
of  motion  sequences.  This  could  not  be  accomplished  satisfactorily  prior 
to  the  advent  of  the  videodisc.  " 

Participants  in  the  Pro.ject 

Three  Army  agencies,  a  civilian  contractor,  and  two  Army  television 
studios  are  working  with  the  Army  Research  Institute  in  the  development  and 
evaluation  of  this  new  technology  for  interpersonal  skills  training.  The 
Army  agencies  are  the  Ttaining  Developments  Institute  of  the  US  Army  Training 
and  Doctrine  Command  at  Fort  Monroe,  Virginia  which  is  funding  scenario  de¬ 
velopment  and  evaluation;  The  Army  Communicative  Technology  Office  at  Fort 
Eustis,  Virginia  which  is  providing  equipment  and  videodisc  mastering;  and 
the  US  Army  Infantry  School  at  Fort  Benning,  Georgia  which  is  providing 
leadership  subject  matter  experts.  The  Litton  Mellonics  Systems  Development 
Group  at  Fort  Benning,  Georgia  is  developing  the  leadership  training  scenarios, 
integrating  the  computer  and  videodisc  liardware  and  developing  the  computer 
software.  They  will  also  carry  out  the  experimental  evaluation  of  the  materials. 
Video  production  is  being  done  primarily  by  the  Fort  Benning  Educational  Tele¬ 
vision  Branch  with  some  assistance  from  the  Training  and  Audiovisual  Support 
Center  at  Fort  Gordon,  Georgia. 

COMPUTER-VIDEODISC  TECHNOLOGY 

Videodisc  technology  provides  television  displays  which  are  much  more 
flexible  than  those  which  come  from  videotape.  Access  from  any  one  frame  or 
sequence  to  any  other  frame  or  sequence  is  less  than  five  seconds  for  the 
videodisc  player  we  are  using.  Less  distant  segments  on  the  disc  can  be  reached 
and  displayed  in  less  than  one  second.  In  addition,  during  forward  and  reverse 
searches,  the  videodisc  player  is  monitoring  the  frame  number.  As  a  result,  the 
exact  frame  can  be  selected  which  corresponds  to  the  beginning  of  a  new  motion 
sequence.  Alternatively,  the  videodisc  player  can  repeat  that  single  frame 
over  and  over  again  for  a  static  display.  (This  single  frame  feature  would 
allow  a  slide-show  of  54,000  separate  pictures.) 

Ihe  player  can  be  interfaced  to  an  external  microcomputer.  This  allows 
computer  controlled  branching  to  different  segments  of  the  videodisc.  It  also 
makes  it  possible  to  present  computer  graphics  on  the  video  screen  for  increased 
instructional  flexibility.  In  addition,  a  light  pen  or  touch  panel  can  be  added 
to  the  system  to  permit  the  student  to  interact  with  the  display.  Finally,  a 
real-time  clock  can  be  used  to  measure  response  latencies  and  use  ttose 
latencies  as  cues  for  certain  video  segments. 

Our  syston  consists  of  an  MCA  PR-7820  videodisc  player  interfaced  to  an 
Apple-II  computer  (48K  plus  PASCAL  language  card)  via  a  Colony  Products  VAI 
controller  card.  We  also  use  a  Symtec  light  pen  and  a  Mountain  Hardware 
real-time  clock.  All  software  is  written  in  the  programming  language  PASCAL. 


LEADERSHIP  TRAINING  APPLICATION 


Two  different  modes  of  instruction  have  been  developed.  The 
"experiential"  mode  will  be  discussed  first  followed  by  the  "pedagogical" 
mode. 


In  the  typical  training  situation,  a  new  junior  officer  leadership  trainee 
sits  before  a  television  receiver  holding  the  "light  pen"  that  allows  direct 
interaction  with  the  display.  Typically,  the  television  is  first  used  to  present 
some  written  background  information  to  the  trainee  about  a  soldier  who  presents 
a  leadership  problem.  This  might  be  a  new  private  in  the  platoon  who  has  finan¬ 
cial  problems  or  an  NCO  who  has  been  verbally  abusing  members  of  his  squad.  That 
individual  then  appears  on  the  screen  and  is  seen  entering  the  lieutenant's 
office  or  approaching  the  liaitenant  in  the  field  setting.  The  televised 
subordinate  typically  begins  the  interaction,  speaking  directly  to  the  viewer. 

Following  the  background  data  and  this  initial  comment  by  the  simulated 
subordinate,  the  leadership  trainee  is  shown  a  televised  menu  of  possible 
responses  that  a  new  lieutenant  might  make  in  the  situation.  Each  response 
was  carefully  chosen  by  scenario  developers  to  appeal  to  at  least  some  new 
leadership  trainees.  However,  some  of  the  responses  are  nuch  more  appropriate 
in  the  situation  than  others.  The  trainee  reviews  these  alternative  responses 
and  points  with  the  light  pen  to  the  one  believed  to  be  best.  Immediately,  the 
simulated  subordinate  reappears  on  the  screen,  behaving  as  he  prohahly  would  if 
treated  in  the  manner  that  was  selected  from  the  response  menu.  When  this  video 
segment  depicting  the  simulated  subordinate  is  complete,  a  new  menu  of  responses 
for  the  trainee  appears  on  the  screen  and  the  trainee  selects  a  response  for 
this  updated  situation  with  the  light  pen. 

In  this  "experiential"  node  of  instruction,  interactions  continue  between 
the  leadership  trainee  and  the  simulated  subordinate  for  as  many  as  ten  exchanges 
until  the  situation  is  resolved  for  better  or  worse.  In  the  latter  case,  the 
simulated  s"bordinate  might  be  last  seen  on  the  TV  bolting  away  from  the  lieu¬ 
tenant  mattering  about  incompetent  second  lieutenants.  Should  the  leadership 
trainee  pause  too  long  prior  to  responding,  the  computer  would  "know"  this  and 
would  automatically  display  the  simulated  subordinate  saying  something  like 
"If  you  are  finished,  Six,  I  need  to  get  back  to  the  troops." 

It  is  expected  that  these  interactive  scenarios  with  their  rapid  branching 
will  cause  trainees  to  react  and  respond  to  the  subordinate  depicted  on  the  TV 
in  much  the  same  way  as  they  would  to  a  real  subordinate.  This  approach  to  in¬ 
terpersonal  skills  training  might  provide  a  potent  tool  for  training  leader  skills 
tiiat  unfortunately  now  are  frequently  learned  only  by  trial  and  error  on  the  job. 
In  the  instructional  mode  described  above,  the  videodisc  interactive  scenarios 
will  also  provide  tr ial-and-error  learning,  but  the  errors  will  not  have  serious 
negative  consequences  for  the  person  the  leader  deals  with  or  for  the  leader 
himself . 

The  second  or  "pedagogical"  mode  of  instruction  provides  more  feedback  to 
the  leadership  trainee.  In  this  mode,  the  student  will  first  be  asked  to  con¬ 
struct  a  response  for  a  given  leadership  situation,  by  either  writing  it  down 
or  thinking  it  through.  Next,  the  trainee  is  presented  a  response  menu  and 


asked  to  select  the  response  tlmt  is  closest  Lo  '  be  trainee's  answer,  or  is 
the  best  alternative.  Alter  selecting  an  alternative,  the  student  lias  the 
option  of  previewing  the  response.  li  this  option  is  selected,  the  trainee 
is  shown  the  model  lieutenant  (actor)  making  that  response.  This  preview 
insures  that  the  trainee  is  not  tricked  b;  verbal-beliavioral  discrepancies. 

If  the  student  does  not  like  the  response,  the  program  branches  back  to  the 
response  menu.  After  the  leadership  trainee  decides  to  keep  a  respinse,  the 
videodisc  plays  the  response  (camera  on  subordinate ’ s  face),  and  the  trainee 
sees  the  subordinate's  reactions  during  and  alter  the  statement.  Following 
this  notion  sequence,  computer-generated  text  informs  the  trainee  whether 
or  not  the  response  selected  was  the  best  option  and  provides  precise  feedback 
about  why  it  was  correct  or  not.  If  the  alternative  selected  was  not  the  best 
choice,  the  trainee  is  again  presented  the  response  menu  but  with  the  incorrect 
alternative  removed.  If  it  was  the  best  alternative,  the  student  is  given  the 
option  of  viewing  any  or  all  of  the  wrong  alternatives  to  see  why  they  were 
less  appropriate.  When  finished  with  the  first  choice  point,  the  trainee  is 
taken  to  the  second  choice  with  a  brief  video  review  to  recreate  the  situation. 

In  this  pedagogical  mode  the  trainee  is  never  allowed  to  go  more  than  one  step 
off  the  "best  path." 

The  same  videodisc  can  be  used  for  both  the  "experiential"  and  the 
"pedagogical"  inodes  of  instruction,  because  the  computer  software  dictates  the 
mode  of  instruction.  Research  is  planned  to  establish  the  optimal  means  for 
combining  these  two  modes  of  instruction. 

PROGRESS  AND  PLANS 

The  initial  videodisc  scenario  was  completed  in  the  Summer  of  J  981 .  All 
eight  will  be  completed  less  than  a  year  later.  The  scenarios  will  receive 
their  initial  validation  in  a  leadership  course  for  new  Infantry  lieutenants. 
However,  they  might  be  sufficiently  general  tlvat  they  could  be  used  for  other 
Army  Branches  and  possible  leaders  in  other  services. 

The  eight  scenarios  might  also  provide  a  powerful  and  inexpensive  leader¬ 
ship  assessment  tool.  All  eight  videodiscs  will  provide  nearly  100  opportunities 
to  measure  quality  of  leader  responses.  Such  assessment  data  could  possibly  be 
used  to  aid  in  selecting  candidates  for  the  Army's  Branch  Immaterial  Officer  Can¬ 
didate  Course  (formerly  OCS)  or  for  the  Military  Academy.  The  procedure  might 
also  be  used  as  3  voluntary  refresher  course  for  more  experienced  Army  leaders. 

Future  videodisc  developments  are  anticipated  for  training  the  critical 
interpersonal  skills  of  race-relations  officers,  chaplains,  military  police, 
and  senior  officers.  Training  of  tactics  and  combined  training  of  tactics  and 
interpersonal  leadership  skills  are  also  foreseen. 
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This  paper  defines  and  discusses  the  difference  between  recall  and  recognition. 
Examples  are  presented  to  illustrate  the  difference.  The  implication  of  the  differ¬ 
ence  is  that  modern  training  programs  are  computer  managed,  or  computer  assisted  and 
make  use  of  machine  scored  answer  sheets.  These  modern  systems  have  not  been  capable 
of  testing  recall-only  recognition.  The  graduate  of  a  training  program  needs  to  be 
able  to  recall  rather  than  recognize.  The  system  for  testing  recall  by  means  of  a 
machine  scored  answer  sheet  is  explained  and  demons tra ted. ^ 
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TEST I KG  RECALL  BY  MEANS  OF 
MULTIPLE  CHOICE  TESTS 


ABSTRACT  ] 

I 

This  presentation  explains  the  difference  between  recall  and  recognition.  The  1 
implications  of  the  difference  are  discussed.  Special  attention  is  given  to  the  ' 

fact  that  the  extensive  use  of  machine  scoring  causes  recognition  to  be  tested 
rather  than  recall.  The  method  for  testing  recall  by  means  of  multiple  choice 
testing  is  explained  and  an  example  is  demonstrated. 

EXPLANATION 

Recall  and  recognition  are  different  mental  processes.  Recall  requires  the  stud< 
to  retrieve  a  term  or  word  from  memory.  The  student  is  provided  no  help;  he  is  aske< 
for  a  response  and  the  student  must  go  through  the  process  of  searching  through  his 
mental  files  trying  to  find  the  correct  term  or  name.  Recall  is  an  active  process 
and  a  higher  level  of  mental  activity  than  recognition.  Recognition,  on  the  other 
hand,  tends  to  be  a  passive  process.  The  student  is  required  to  select  from  a  list 
presented  to  him.  In  summary:  the  processes  are  different;  recall  is  a  higher  level 
activity  with  more  requirement  for  mental  activity;  recognition  is  a  lower  level 
mental  activity  which  tends  to  be  passive. 

At  this  point  an  attempt  will  be  made  to  provide  some  evidence  of  the  difference 
between  recall  and  recognition.  You  may  have  had  the  experience  of  thinking  that 
you  know  an  answer  to  a  question,  but  not  being  able  to  state  the  answer.  When  you 
see  the  answer  among  a  set  of  responses  you  are  able  to  identify  the  answer.  Th's 
is  an  example  of  being  able  to  recognize,  but  not  recall.  Let's  try  some  examples. 
You  will  be  asked  a  question;  a  few  moments  after  the  question  is  asked  a  transpires 
will  be  projected  showing  possible  responses  -  one  of  which  is  correct. 

Who  was  vice-president  during  the  last  four  years  of  Lyndon  Johnson's  presidents 
term? 

A.  Hubert  Humphrey 

B.  Walter  Mondale 

C.  William  Miller 

D.  Spiro  Agnew 

E.  Nelson  Rockefeller 

What  is  the  scientific  name  for  the  dog? 

A.  Felis  Catus 

B.  Equus  Cabellus 

C.  Semper  Fidel  is 

D.  Canis  Familiaris 

E.  Fidelis  Fedelis 

What  is  the  name  of  the  outermost  planet  of  our  solar  system? 

A.  Saturn 

B.  Mars 

C .  Jupiter 

D.  Pluto 

E.  Mercury 


386 


Next,  we  will  look  at  the  implications  of  the  difference  between  recall  and 
recognition.  The  distinction  is  often  fuzzy  in  the  world  of  training.  We  often 
write  objectives  that  say  "the  student  will  identify..."  and  this  often  means 
recognize  because  the  student  is  required  to  recognize  the  correct  term  in  a  4 
choice  multiple  choice  question  .  If  the  objective  says  "the  student  will  recall...," 
then  the  student  should  be  required  to  recall  -  not  to  recognize.  In  real  world 
working  situations  the  distinction  is  important.  In  the  real  world,  workers  are 
required  to  recall -not  recognize.  The  ordnance  technician  who  needs  a  new  cam 
pin  should  be  able  to  recall  the  name  of  the  part;  he  will  not  be  given  a  list  from 
which  he  will  recognize  the  needed  part.  He  must  recall  and  then  say  or  write  the 
name  of  the  needed  part;  it  is  not  enough  for  the  worker  to  recognize  names  of 
parts.  The  essential  point  is  this:  It  may  be  more  desirable  to  train  students  to 
recall  terms  rather  than  training  for  recognition. 

Modern  training  programs  have  been  developed  to  achieve  a  maximum  of  efficiency. 

To  achieve  this  degree  of  efficiency,  training  programs  often  rely  on  testing  and 
evaluation  systems  that  utilize  machine  scoring,  computer  assistance  and  computer 
management.  This  means  that  multiple  choice  items,  which  can  be  machine  scored,  are 
used  extensively  in  modern  training  programs.  Since  extensive  use  is  made  of 
multiple  choice  items,  most  of  the  testing  is  testing  of  recognition  rather  than 
recall.  There  is  a  need  to  be  able  to  test  recall  by  means  of  multiple  choice  items. 
The  remainder  of  this  presentation  will  describe  a  method  for  testing  recall  by  means 
of  the  multiple  choice  test. 

The  method  requires  that  the  student  recall  and  write  out  the  required  response 
on  an  initial  response  sheet.  This  response  will  be  the  basis  for  answering  a  mul¬ 
tiple  choice  question.  The  multiple  choice  questions  will  ask  the  student  to 
identify  certain  numbered  lettersin  the  answer.  An  example  is  needed  to  illustrate 
the  definition.  Suppose  that  the  question  requires  the  word  multimeter  as  the 
correct  response.  The  student  writes  out  the  word  multimeter  on  the  initial  response 
sheet.  The  question  asks  the  student  to  identify  the  2nd  and  4th  letters  of  the 
term  and  is  given  responses  such  as  these: 

A.  n,t 

B.  c  ,n 

C.  r,s 

D.  u,t 

The  student  counts  the  first  four  letters  in  the  word  multimeter  and  identifies 
u  and  t  as  the  2nd  and  4th  letters.  The  student  therefore  selects  choice  D  as 
the  correct  answer.  It  may  be  helpful  to  go  through  another,  more  detailed  example. 
Suppose  that  you  are  teaching  about  the  parts  of  a  master  brake  cylinder.  Rather 
than  dealing  with  every  possible  part,  only  six  parts  will  be  discussed.  The  parts 
are  as  follows.  (See  incl  1) 


3.  spacer 

5 .  body 

6.  filler  cap 

10.  piston  cup 

11.  piston 

15.  push  rod  boot 
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Let's  review  each  of  the  six  parts  to  fix  them  in  your  mind... 

Next,  let's  take  a  look  at  what  some  questions  about  these  parts  might  look  like. 

The  first  page  we  will  call  the  initial  response  sheet.  It  requires  the  student 
to  write  in  the  names  of  the  parts.  Remember  that  the  initial  response  sheet  is 
not  turned  in  for  evaluation,  but  is  used  as  the  basis  for  answering  multiple 
choice  questions.  (See  incl.  2)  The  instructions  preceding  the  multiple  choice 
questions  would  be  as  follows.  After  you  have  written  the  names  of  the  parts  in  the 
blanks,  use  that  information  to  answer  questions  that  follow.  If  the  part  name 
consists  of  2  or  more  words,  treat  it  as  a  single  long  word. 

Question  1.  Identify  the  2nd  and  5th  letters  of  term  3. 

A.  r,n 
*B.  p,e 

C.  o,e 

D.  s,t 

Question  2.  Identify  the  6th  and  8th  letters  of  term  6. 

*A.  r,a 

B.  s,e 

C.  o,r 

D.  n,l 

This  concludes  the  example.  Note  that  in  question  2  the  term  filler  cap  was  con¬ 
sidered  as  one  word;  the  sixth  letter  was  the  last  letter  of  the  first  word  and  a 
is  the  8th  letter  of  the  2  two  words  are  combined. 

DEMONSTRATION 

Now  let's  try  the  system  in  something  like  a  real  world  training  situation.  The 
transparency  shows  parts  of  an  older  model  of  flame  thrower.  (See  incl  3)  Notice 
that  there  is  large  spring  called  an  adjusting  spring  enclosed  by  a  spring  case. 

At  the  top  end  Df  the  spring  is  the  adjusting  spring  botton.  The  spring  pressure 
can  be  adjusted  by  the  adjusting  screw  at  the  top  of  the  mechanism.  The  opposite- 
end  of  this  pushes  against  the  diaphragm  assembly.  There  is  an  inlet  and  an  outlet 
with  the  outlet  apparently  larger.  There  is  also  a  smaller  spring  called  the 
compensating  spring;  this  spring  is  on  the  opposite  end  from  the  adjusting  spring. 
Some  other  parts  are  the  nozzle,  the  operating  pin,  and  the  body.  A  total  of  11 
parts  are  identified. 

You  will  be  provided  with  2  documents.  [See  incl  4  &  5)  The  first  is  the  initial 
response  sheet  which  is  a  duplicate  of  the  transparency.  The  other  is  the  question 
sheet  with  a  reproduction  of  a  part  of  an  IBM  answer  sheet.  Fill  out  the  initial 
response  sheet  and  use  this  information  to  answer  the  questions  on  the  response  sheet. 

The  correct  answers  are:  1,  E;  2,  A;  3,  A;  4,  D;  5,  C;  6,  B;  7,  D;  8  C;  9,  A; 

10,  D;  11,  E. 

An  obvious  reaction  to  this  method  is  that  it  requires  the  student  to  be  able  to 
spell  and  therefore  it  is  partly  a  test  of  ability  to  spell.  This  problem  deserves 
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comment.  It  is  often  possible  to  minimize  spelling  problems.  Diaphragm  is  a  dif¬ 
ficult  word  to  spell;  in  the  example  you  were  required  to  know  only  the  first  and 
second  letters  so  it  was  possible  to  get  the  correct  answer  without  being  able  to 
spell  the  word.  All  that  was  necessary  was  to  know  the  first  2  letters.  The 
demonstration  contained  another  problem  that  should  be  commented  on.  In  the 
example  there  were  3  parts  beginning  with  adjusting;  to  make  sure  the  right  term 
was  being  tested  in  question  7,  the  student  was  required  to  string  15  letters 
together  correctly.  In  an  actual  teaching-learning  situation  we  usually  sample 
knowledge  rather  than  testing  100%  of  items  taught.  Therefore,  in  an  actual  teaching' 
learning  situation  we  could  have  avoided  this  problem  by  not  testing  all  3  of  these 
parts. 

In  the  final  section  cf  this  presentation  some  conventions  are  suggested.  (1) 
Ignore  capital  letters;  all  responses  will  be  small  letters  even  though  one  of 
the  letters  of  the  answer  is  a  capital  letter.  Example:  the  answer  is  Georg  Simon 
Ohm  and  the  5  &  6  letters  are  required.  Convention  1  says  use  g,s  rather  than 
G,s.  (2)  Treat  multiple  word  responses  as  a  long  single  word;  do  not  count  spaces 
between  words.  (3)  Avoid  requiring  students  to  learn  difficult  spelling.  If  the 
spelling  is  difficult,  ask  questions  about  the  first  2  or  3  letters  of  a  word. 

(4)  Avoid  consecutive  letters  if  feasible  (5)  Avoid  use  of  first  letters  where 
feasible  (6)  Construct  plausible  alternatives;  think  of  plausible  alternatives  and 
use  letter  combinations  of  those  alternatives.  If  the  correct  answer  is  tinsnips 
and  the  numbers  of  the  correct  letters  are  (2,4)  the  correct  response  is  (i,s). 

A  plausible  alternative  is  scissors  and  the  plausible  choice  is  (c,s).  (7)  Avoid 

asking  questions  about  terms  that  have  synonyms.  This  is  not  much  of  a  problem 
in  technical  areas  because  technical  terms  seldom  have  exact  synonyms. 

A  final  note  about  this  methodology:  This  is  an  experimental  method;  it  has 
not  been  field  tested,  but  informal  experimentation  shows  promising  results. 
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3.  SPACER 

5.  BODY 

f.  FILLER  CAF 

10.  PISTG.’J  CLP 

11.  PISTON 

15.  PUSH  ROD  LOOT 

i 

e  ??  o  !.  2 
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INSTRUCTIONS:  Write  out  the  identification  of  parts  1  through  il  on  the  initial  response  sh 
Use  this  information  to  answer  questions  T  through  iT.  Put  your  responses  on 
the  IBM  sheet  below.  If  the  part  name  consists  of  2  or  more  words,  treat  it 
a  single  word.  The  numbers  in  parenthesis  after  the  question  indicate  the  le 
to  be  used  in  answering  the  multiple  choice  question.  The  11  questions  corre 
to  the  11  parts  on  the  initial  response  sheet. 


1.  (2.14) 

2.  (5,11) 

3. 

A.  c,e 

A.  s.p 

A. 

B.  r.t 

B.  o,n 

B. 

C.  o,n 

C.  e.f 

C. 

D.  k.l 

D.  s,s 

0. 

E.  d.w 

£•  j.u 

E. 

9.  (2,5) 

10.  (2,4) 

11. 

A.  o,l 

A.  1  ,r 

A. 

B.  n,r 

B.  s,r 

B. 

C.  s,w 

C.  n,r 

C. 

D.  c,r 

D.  o,y 

D. 

E.  o,e 

E.  d.e 

E. 

(2.5) 

4.  (3.5) 

5.  (2.6) 

p.n 

A.  i  ,n 

A.  g.s 

r,o 

3.  k,r 

B.  e.e 

g.r 

C.  n,s 

C.  u.t 

1  .w 

0.  e,a 

D.  r.s 

d, e 

(3,5) 

e, a 

j.l 

r.t 

o.f 

l.t 

E.  r.t 

E.  n.l 

6.  (4,6) 

7.  (3,15) 

8.  (1, 

A.  c.k 

A.  f.n 

A.  p. 

B.  p.n 

B.  n.r 

B.  b. 

C.  r.n 

C.  o.l 

C.  d. 

D.  s.l 

0-  j.g 

0.  c . i 

E.  s,a 

E.  d.n 

E.  s, 

IF  T  F 

KDfDfDffld)  31  CD  CD  CD  CD  CD 
2  CD  CD  CO  CD  GD  32  OO  CD  CD  QD  CD 
3®<B(Da)CD  33  CD  <D  CD  CD  OD 

4  CD  OC  OD  CD  CD  34  CD  CD  CD  OD  CD 

5  CD  CD  CD  CD  CD  35  CD  CD  CD  GD  CD 

6  CD  CD  CD  C2>  CD  36  CD  CD  CD  GD  CD 

7  CD  CD  CD  CD  CD  3’  CD  CD  CD  CD  CD 
4  CD  CD  CD  CD  CD  38  CD  CD  CD  CD  CD 
9  CD  CD  CD  CD  CD  39  CD  CD  CD  (  ID  CD 

10  cd  an  cd  cd  cd  40  cs>  an  cd  cd  cd 

1 1  CD  CO  CD  CD  CD  41  CD  CD  CD  CD  CD 

12  CD  CD  CD  CD  CD  42  CD  CD  CD  GD  CD 

13  CD  CD  CD  CD  CD  43  CD  CD  CD  CD  CD 

14  CD  CD  CD  CD  CD  44  CD  CD  CD  CD  CD 

15  CD  CD  CD  CD  CD  «r  CD  CD  CD  CD  CD 

16  CD  CD  CD  CD  CD  46  CD  CD  CD  CD  CD 

17  CD  CD  CD  CD  CD  47  CD  CD  CO  CD  CD 

18  CD  CD  CD  CD  CD  18  CD  CD  CD  CD  CD 

19  CD  CD  CD  CD  CD  49  (D  CD  CD  (El  CD 
70  CD  <3  j  CD  CD  CD  SO  CD  CD  CD  CD  CD 
?1  CD  CD  O  CD  CD  51  CD  CD  CD  CD  OD 
Z2  CD  CD  CD  CD  CD  52  CD  CD  CD  CD  CD 
23(DCDCD(E>CD  53  CD  CD  CD  (33  CD 

24  CD  CD  CD  CD  CD  54  CD  CD  CD  CD  CD 

25  CD  CD  CD  CD  CD  55  CD  CD  CD  CD  CD 

26  CD  CD  CD  CD  CD  56  CD  CD  CD  CD  CD 

27  CD  CD  CD  CD  CD  $7  CD  CD  CD  CD  CD 
2*  CD  (D  CD  CD  CD  58  CD  CD  CD  CD  CD 

29  CD  CD  CD  CD  CD  59  CD  CD  CD  CD  CD 

30  CD  CD  CD  CD  CD  60  CD  CD  CD  CD  CD 

ewe/,  f 


IF  IF 

61  CD  (13  CD  OD  CD  91  CD  CD  CD  CD  CD 

62  CD  CSS  CD  GD  CD  92  CD  QD  CD  CD  CD 

63  CD  CD  CD  CD  CD  93  CD  CD  CD  CD  CD 

64  CD  CD  CD  CD  CD  94  CD  CD  CD  CD  CD 
6$  CD  CD  CD  CD  CD  95  CD  CD  CD  CD  CD 
66  CD  CD  CD  CD  CD  96  CD  CD  CD  GD  CD 
57  CD  CD  CD  CD  C  D  97  CD  CD  CD  CD  CD 

68  CD  (D  CD  CD  CD  98  O  CD  CD  CD  CD 

69  CD  CD  CD  GD  CD  99  CD  CD  CD  CD  CD 

70  CD  CD  CD  CD  CD  100  GD  CD  CD  CD  CD 

71  CD  CD  CD  CD  CD  101  CD  CD  CD  CD  CD 

72  CD  CD  CD  CD  CD  102  CD  CD  CD  CD  CD 

73  CD  CD  CD  CD  CD  103  CD  CD  CD  CD  CD 

74  CD  CD  CD  CD  CD  104  CD  CD  CD  CD  CD 

75  CD  CD  CD  CiD  CD  105  CD  CD  CD  CD  CD 
75  CD  CD  CD  CD  CD  106  CD  CD  CD  CD  CD 

77  CD  CD  CD  CD  CD  107  CD  CD  CD  CD  CD 

78  CD  (D  CD  CD  CD  108  CD  CD  CD  CD  CD 

79  CD  CD  CD  CD  CD  109  CD  CD  CD  CD  CD 

80  CD  CD  CD  CD  CD  110  CD  CD  CD  GD  CD 

81  CD  CD  CD  CD  CD  ii i  CD  CD  CD  CD  CD 
83  CD  CD  CD  CD  CD  112  CD  CD  CD  CD  CD 

83  CD  CD  CD  CD  CD  113CD  CD  CD  CD  CD 

84  CD  OD  CD  CD  CD  114  CD  CD  CD  CD  CD 

85  CD  CD  CD  CD  CD  115  CD  CD  CD  QD  CD 

86  CD  CD  CD  CD  CD  116  CD  CD  CD  CD  CD 

87  CD  OJ  CD  CD  CD  117  CD  CD  CD  CD  CD 

88  CD  CD  CD  CD  CD  118  CD  CD  CD  CD  CD 

89  CD  CD  CD  CD  CD  119  CD  CD  CD  CD  CD 

90  CD  CD  CD  CD  CD  1?0  CD  CD  CD  CD  CD 


IF  IF 

121  CD  CD  CD  CD  CD  151  CD  CD  CD  CD  CD 

122  CD  CD  CD  CD  CD  152  CD  CD  CD  CD  CD 

123  CD  CD  CD  CD  CD  153  GD  (X)  CD  CD  CD 

124  CD  CD  CD  CD  CD  154  CD  CD  CD  CD  CD 

125  CD  CD  CD  CD  CD  155  CD  CD  CD  CD  CD 

126  CD  CD  CD  CD  CD  156  CD  CD  CD  CD  CD 

127  CD  CD  CD  CD  CD  157  CD  CD  CD  CD  CD 

128  CD  CD  CD  CD  CD  158  CD  CD  CD  CD  CD 

129  CD  CD  CD  OD  CD  159  CD  CD  CD  CD  CD 

130  CD  CD  CD  CD  CD  160  CD  CD  CD  CD  CD 

131  CD  CD  CD  CD  CD  161  CD  CD  CD  CD  CD 

132  CD  CD  CD  CD  CD  162  GD  CD  CD  CD  CD 

133  CD  CD  CD  CD  CD  163  CD  CD  CD  CD  CD 

134  CD  OD  CD  CD  CD  164  CD  CD  CD  CD  CD 

135  CD  CD  CD  CD  CD  165  CD  CD  CD  CD  CD 
136CDCDCDCDCD166CDCDCDCDCD 

137  CD  CD  CD  CD  CD  167  CD  CD  CD  CD  CD 

138  CD  OD  CD  CD  CD  168  CD  CD  CD  CD  CD 

139  CD  D  CD  CE  CD  169  CD  CD  CD  QD  CD 

140  CD  CD  CD  CD  CD  170  CD  CD  CD  CD  CD 

141  CD  CD  CD  CD  CO  171  CD  CD  CD  CD  CD 
147CDCDCDCD(D17?CDCDCDCDCD 

143  CD  CD  CJ  OD  OD  173  OD  CE  CD  CD  CD 

144  CD  CD  CD  CD  CL;  174  U  CD  CD  CD  CD 

145  CD  CD  CD  CD  CD  175  CD  CD  CD  OD  CD 
146<dcdcdcdcd176cdcdcdcdcd 

147  CD  CD  CD  CD  CD  !77  CD  CD  CD  CD  CD 

148  CD  CD  CD  (Z>  CD  171  CD  CD  CD  ClD  CD 

149  CD  CD  CD  CD  CD  179  CD  CD  OD  CD  CD 

150  CD  CD  CD  CD  CD  180  CD  CD  CD  CD  OO 
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Elig,  Timothy  W. ,  Gade,  Paul  A.  &  Eaton,  Newell  Kent,  US  Army  Research 
Institute  for  the  Behavioral  and  Social  Sciences,  Alexandria,  Virginia. 
(Wed.  A.M.) 

Performance  Criteria  Development  for  Army  Field  Recruiters 


A  variety  of  measures  which  have  served  as  criteria  of  recruiter 
performance  are  discussed.  New  approaches  to  productivity  measurement 
are  developed  to  reflect  both  the  relative  value  of  different  recruits 
to  the  Army  and  the  influence  of  area  fertility  on  recruiter  produc¬ 
tivity.  Recent  ARI  research  on  FY79  productivity  of  612  Army  re¬ 
cruiters  is  presented.  The  large  Influence  of  District  Recruiting 
Command  fertility  on  individual  recruiter  productivity  (accounting  for 
32%  of  the  variance)  was  found  to  be  primarily  due  to  low  priority 
recruits  (those  recruits  who  have  low  AFQT  scores  and/or  did  not  get  a 
high  school  diploma):  DRC  average  production  accounts  for  34%  of  the 
variance  in  production  of  low  priority  recruits  while  it  accounts  for 
less  than  9%  of  variance  in  production  of  high  priority  recruits. 
Managerial  implications  of  area  fertility  adjustments  of  recruiter 
production  are  discussed.  Recruiter  reactions  to  performance  appraisal 
adjustments  for  DRC  fertility  are  considered. 
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For  both  day-to-day  operations  and  for  long  range  planning,  the 
development  and  utilization  of  recruiter  performance  criteria  is  of  vital 
concern  to  managers  of  recruiting  forces.  Under  the  all  Volunteer  Force,  the 
role  of  the  military  service  recruiter  has  increased  in  importance  and 
recruiting  managers  have  felt  the  need  to  improve  recruiter  productivity.  A 
key  issue  in  improving  recruiter  productivity  is  how  to  measure  recruiter 
productivity.  We  are  concerneo  in  this  paper  with  two  major  changes  in 
recruiting  that  have  caused  us  to  re-evaluate  the  way  individual  recruiter 
performance  is  measured. 

The  first  major  change  is  related  to  the  increased  importance  of  the 
individual  recruiter  since  the  cessation  of  tne  draft.  Department  of  the  Army 
demands  placed  upon  the  Army  Recruiting  Command  are  adjudicated  thru  three 
levels  of  this  Command  and  ultimately  placed  on  individual  recruiters  through 
monthly  recruiting  requirements.  With  no  draft  to  make  up  shortfalls  and  to 
motivate  individuals  to  enlist,  the  Command's  concern  with  individual 
recruiter  productivity  increased  exponentialy.  Thus,  performance  criteria 
for  recruiters  narrowed  from  a  broad  concern  with  the  recruiter  as  a  soldier 
representing  the  military  in  a  civilian  community  to  a  focus  on  the  number  of 
enlistments  each  month  he  or  she  could  produce.  Congressional  concern  with 
the  quality  of  enlistees  in  the  Army  has  been  translated  by  the  Recruiting 
Command  to  monthly  recruiting  requirements  assigned  to  each  Army  recruiter. 

The  monthly  mission  box  assigned  each  recruiter  is  a  three  dimensional  matrix 
of  the  number  of  Non  Prior  Service  individuals  the  recruiter  is  to  contract  by 
recruit  gender,  education  level  and  Armed  Forces  Qualification  Test  (AFQT) 
category,  as  well  as  a  separate  category  of  Prior  Service  applicants. 

Education  level  specifies  individuals  as  being  High  School  Diploma  Graduates 
(HSDG),  High  School  Seniors  (HSSR),  or  Non  High  School  Graduates  (NHSG).  AFQT 
category  specifies  whether  the  person  is  at  or  below  the  3!st  percentile  on 
the  Armed  Forces  Qualification  Test. 

The  increased  emphasis  on  individual  recruiter  productivity  is  the  concern 
of  researchers  both  from  the  perspective  of  developing  criteria  for  recruiter 
management  research  (e.g.,  efforts  to  find  improved  recruiter  selection  and 
assignment  factors)  and  from  the  perspective  of  the  understanding  and 
acceptance  individual  recruiters  have  of  the  performance  criteria  used  to 
evaluate  them. 


Role  ambiguity  and  conflict  which  can  result  from  the  setting  of 
performance  standards  are  important  concerns  in  personnel  management.  Role 
amDiguity  refers  to  the  degree  to  which  an  individual  actually  understands 
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what  is  required  on  the  job.  This  is  different  from  role  conflict  in  which 
the  individual  understands  the  competing  demands  which  are  being  made  but  may 
be  unable  to  resolve  which  demands  are  more  important.  Role  ambiguity  and 
conflict  have  been  found  to  be  related  to  negative  states  such  as 
dissatisfaction,  stress,  impaired  performance  and  inappropriate  organizational 
behavior  (Rizzo,  House,  &  Lertzman,  1970;  Schuler,  Aldag,  &  Brief,  1977; 
Keller,  1975). 

The  second  major  change  that  is  leading  to  re-evaluation  of  performance 
criteria  for  recruiters  is  the  need  to  develop  criteria  which  are  truly 
reflective  of  an  individual's  performance,  and  not  merely  reflective  of  large 
differences  in  task  difficulty  associated  with  geopolitical  and  socioeconomic 
factors  of  the  recruiter's  assigned  area.  For  example,  Bennett  and  Haber 
(Note  1)  found  that  an  urban  or  rural  assignment  was  an  important  variable 
determining  enlistment  success  of  Marine  recruiters.  In  another  study  of 
Marine  recruiters,  Larriva  (Note  2)  found  that  multiple  correlations  between 
various  predictors  of  recruiter  success  and  evaluations  of  recruiter's 
performance  improved  when  geographic  and  rural  vs.  urban  characteristics  were 
controlled.  Criterion  research  conducted  for  the  Army  during  1973  and  1974 
was  broadly  focused  on  differences  in  recruiting  "territory  fertility".  To 
account  for  fertility  differences,  individual  performance  was  expressed  as  a 
deviation  from  the  mean  performance  in  the  individual's  territory  (Fischl, 

Note  3).  Average  number  of  recruits  per  recruiter  in  a  District  Recruiting 
Command  (DRC)  was  shown  to  account  for  48%  of  the  criterion  variance  of  number 
of  accessions  (Brown,  Wood  &  Harris,  Note  4).  While  these  findings 
demonstrate  the  importance  of  taking  geopolitical  and  socioeconomic  variables 
into  account  in  criterion  development,  research  remains  to  be  done  on  exactly 
which  predictor  variables  are  important  to  measure  and  how  to  use  them  in 
criterion  development. 

Demographic  variables  are  currently  used  in  market  analyses  to  determine 
the  number  of  Qualified  Military  Applicants  (QMA)  in  each  recruiting  station's 
area.  The  QMA  is  used  to  determine  the  mission  requirements  for  each  station. 
Thus  area  socioeconomics  indirectly  influence  one  performance  criterion:  the 
percent  of  mission  objective  the  recruiter  actually  achieves.  Here  again  we 
see  a  strong  potential  for  role  ambiguity  and  lack  of  acceptance  of  the 
relatively  subtle  way  that  "fertility"  now  influences  the  establishment  of 
performance  criteria.  An  even  greater  potential  for  conflict  exists  if 
performance  criteria  are  explicitly  based  on  area  fertility. 

In  this  paper  we  focus  on  how  field  recruiters  and  their  immediate 
supervisors  (recruiting  station  commanders)  feel  about  the  current  performance 
criteria  used  for  Army  recruiters  and  how  they  feel  about  alternative  criteria 
that  might  be  used.  Specifically,  we  investigated  the  level  of  recruiter  and 
station  commander  understanding  and  acceptance  of  current  criteria,  their 
preference  for  other  criteria  and  their  reactions  to  adjusting  performance  on 
the  basis  of  DRC  "fertility." 


Methods 

Data  reported  m  this  paper  are  preliminary.  They  include  only  22  of  50 
recruiting  stations  to  be  contacted.  This  data  collection  effort  will  be 
completed  i r.  early  November  19bl.  Complete  details  of  subject  selection  and 
all  procedures  can  be  obtained  from  the  authors. 


Respondents  were  44  recruiters  and  22  station  commanders  from  the  Western, 
Midwestern,  and  Southwestern  Army  Recruiting  Regions  of  the  United  States. 

Two  recruiters  and  the  station  commander  were  individually  interviewed  in  each 
of  two  stations  in  each  of  eleven  DRCs. 

Performance  measurement  was  the  first  substantive  issue  covered  in  all 
these  interviews.  After  the  respondents  were  asked  about  problems  encountered 
in  filling  out  our  questionnaires  and  suggested  improvements  in  the  surveys, 
the  interviewer  raised  the  topic  of  performance  rating.  Respondents  were 
asked  to  read  a  description  of  a  modified  performance  rating  system  that  could 
be  used  to  compare  recruiting  performance  of  racruiters  in  the  different  DRC's 
(see  Appendix  1).  After  being  given  the  DRC  Correction  1  for  their  DRC  each 
respondent  was  asked  how  they  would  react  to  such  a  system  being  used. 

Respondents  were  then  asked  to  describe  how  they  believe  field  recruiter 
performance  is  rated  now.  This  was  followed  by  a  question  asking  recruiters 
and  station  commander  how  they  would  like  to  see  field  recruiter  performance 
rated.  Finally,  respondents  were  asked  to  pick  one  measure  as  the  best 
measure  of  field  recruiter  performance.  The  measures  they  were  asked  to 
choose  from  had  been  included  in  their  surveys.  Station  commanders  had 
previously  rated  each  recruiter  by  an  experimental  performance  report  which 
included  the  questions  in  Appendix  2.  Fach  recruiter  had  also  rated 
themselves  on  these  items. 


Resul ts 

Information  gathered  in  the  interviews  of  recruiters  and  station 
commanders  are  presented  below.  We  first  present  the  information  gathered  on 
respondent's  reactions  to  a  modified  performance  rating  system  which  adjusted 
contract  totals  for  DRC  fertility.  Next  we  present  how  station  commanders  and 
recruiters  believe  recruiters'  performance  is  currently  rated  and  how  they 
would  like  to  see  it  rated.  Finally,  we  present  the  measures  recruiters  and 
station  commanders  feel  are  the  best  measures  of  recruiter  performance. 

Reactions  to  a  Modified  Rating  System 

Overall,  2 3 i  of  respondents  (16$  of  commanders,  26%  of  recruiters) 
accepted  the  system  of  DRC  corrections.  The  largest  group  of  respondents  (62% 
overall,  77*  of  commanders  and  5 i%  of  recruiters)  rejected  a  system  of  DRC 
corrections  because  it  did  no*  go  far  enough  and  adjust  for  "within  DRC" 
fertilit>  differences.  Thus  t5 t  of  respondents  indicated  an  initial 


^DRC  corrpctiorir  wore  computed  ori  a  1975-80  base  for  a  six  month  production 
per ,  od  as  fellows:  Correction;;  6(DRCAV  -  DRCAV-;',  where  DRCAVf  is  the  DRC's 
average  monthly  contract  production  per  recruiter  in  a  ten  montn  base 
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acceptance  of  some  type  of  area  fertility  adjustments.  However,  15$  of  the 
recruiters  rejected  fertility  adjustments  on  the  basis  of  their  assertion  that 
productivity  depends  only  upon  the  recruiter — his  or  her  effort  or  sales 
ability.  Nine  percent  of  the  respondents  (14$  of  commanders  and  7$  of 
recruiters)  indicated  that  the  system  was  unfair  because  it  would  take 
contract  credit  away  from  recruiters  in  DRCs  above  average  in  productivity. 
This  concern  was  not  raised  by  the  other  respondents. 

Current  and  Preferred  Rating  Systems 

In  the  performance  measurement  interview,  respondents  were  asked  how  field 
recruiter  performance  is  now  rated  and  how  they  would  like  to  see  it  rated 
(the  second  and  third  interview  questions  respectively).  Responses  to  these 
questions  were  coded  in  two  ways.  First,  respondents'  views  of  current  and 
preferred  rating  methods  were  coded  for  agreement  or  disagreement.  Second, 
the  current  and  preferred  rating  methods  were  each  coded  in  three  specific 
ways.  The  specific  codings  for  each  were:  a)  the  relative  importance  of 
production  numbers  versus  other  performance  criteria;  b)  the  relative 
importance  of  total  contract  production  versus  categories  of  enlistees  in  the 
mission  box;  and  c)  type  of  criteria  other  than  production  numbers  (e.g., 
recruiter  effort). 

Of  the  64  respondents  who  could  be  coded,  70$  (74$  of  station  commanders, 
68$  of  recruiters)  described  a  preferred  rating  of  field  recruiter  performance 
which  was  different  from  the  system  which  they  believe  is  currently  used. 

Table  1 

Perceived  Importance  of  Criteria  of  Field  Recruiter 
Performance  by  Percentage  of  Respondents 


Criteria  Importance 

Current 

Preferred 

Coding 

Sta  t i on 

Commanders  Recruiters 
n=22  n=42 

Station 

Commanders 

n=20 

Recruiters 

n=40 

Production  numbers  only 

77 

81 

30 

45 

Numbers  and  other  criteria  - 
Numbers  more  important 

14 

14 

0 

8 

Numbers  and  other  criteria  - 
Equal  Importance 

4 

2 

10 

22 

Numbers  and  other  criteria  - 
Other  criteria  more  important 

0 

2 

30 

5 

Other  criteria  only 

4 

0 

30 

20 

99$ 

99$ 

100$ 

100$ 
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Table  1  shows  that  the  vast  majority  of  recruiters  and  station  commanders 
believe  that  production  numbers  are  the  most  important  measure  of  recruiter 
performance  under  the  current  system.  This  table  also  shows  that  the 
respondents  would  prefer  to  de-emphasize  production  numbers  in  measuring 
recruiter  performance.  Furthermore,  station  commanders  and  recruiters  differ 
in  the  emphasis  they  place  on  numbers  in  their  preferred  criteria  CX  =  10.2, 
4df,  p<  .OH).  Surprisingly ,  recruiters  are  less  likely  to  reject  production 
numbers  as  a  preferred  criteria  than  are  station  commanders. 

Table  2 

Perceptions  of  Productions  Numbers  as  Criteria  for  Field 
Recruiter  Performance  by  Percentage  of  Respondents 


Production  Number 

Current 

Preferred 

Criteria  Importance 

Station 

Station 

Commanders  Recruiters 

Commanders 

Recruiters 

n =22 

n=44 

n=22 

n=44 

Numbers  not  mentioned  or 
said  to  be  unimportant 

13 

2 

48 

30 

Unspecified  Production  Numbers 

39 

35 

35 

30 

Specified  Production 

Numbers 

Total  contracts  only 

9 

28 

9 

16 

Contracts  and  mission  box  - 
Contracts  more  important 

13 

9 

0 

12 

Contracts  and  mission  box  - 
Equally  important 

13 

1 4 

4 

5 

Contracts  and  mission  box  - 
Mission  box  more  important 

0 

5 

0 

0 

Mission  box  only 

13 

7 

4 

7 

Too? 

too? 

Too? 

TOO? 

Table  2  presents  the  respondents  view  on  production  figures  as  performance 
criteria.  The  first  row  of  this  table  reiterates  the  previous  finding  that 
Mis  respondents  preferred  ratings  less  dependent  on  contract  production 
figures  than  they  perceive  tne  current  ratings  to  be  The  results  presented 
in  this  table  also  indicate  that  while  respondents  agree  on  the  importance  of 
production  figures  as  measures  of  performance,  they  do  not  agree  on  which 
particular  production  figures  are  important.  This  lack  of  agreement  on 
product i un  figures  is  found  in  both  station  commanders  and  recruiters  in  both 
thejr  understanding  of  the  current  rating  system  and  in  their  preferred  -ating 


system 


Table  3 


Frequency  of  Criteria  Other  Than  Production  Figures 


Criteria 

Current 

Preferred 

Attitude 

3 

1 

Appearance 

3 

0 

Paperwork 

2 

0 

Credibility,  knowledge  as  recruiter 

1 

1 

Effort,  volume  of  applicants  worked 

4 

10 

Supervisor  ratings 

2 

19 

Quality  of  enlistees  but  not  by 
mission  box 

0 

5 

Not  on  a  month  by  month  basis 

0 

4 

The  final  coding  of  current  and  preferred  ratings  wa3  for  performance 
measures  not  using  production  figures  as  criteria.  Table  3  lists  eight 
criteria  which  were  cited  by  two  or  more  respondents.  Four  other  responses 
were  made  only  by  one  individual  and  are  not  listed  here.  Station  comanders 
and  recruiters  differed  only  in  the  extent  to  which  they  preferred  supervisory 
ratings.  Only  14%  of  recruiters  selected  supervisory  ratings  as  the  preferred 
performance  measure  while  44J  of  the  station  commanders  preferred  this 
measure. 

Best  Measures  of  Recruiter  Performance 


The  last  question  on  performance  measures  asked  respondents  to  look  at  the 
questions  in  Appendix  2  and  choose  one  of  these  questions  as  the  best  measure 
of  field  recruiter  performance.  Table  4  presents  the  percentage  of  station 
conmanders  and  recruiters  who  choose  each  suggested  measure  as  the  best 
measure  of  recruiter  performance.  Suggested  measures  can  be  grouped  into  six 
general  categories  as  shown  in  Table  4.  There  were  sharp  differences  between 
recruiters  and  station  commanders  on  what  recruiters  can  and  should  be  held 
responsible  for.  Twenty-five  percent  of  the  respondents  choosing  an  applicant 
processing  measure  emphasized  that  the  recruiter  lacks  control  over  the 
quality  of  the  people  he/she  processes  and  over  whether  an  individual  will 
contract.  Thirty-six  percent  of  the  respondents  chose  "total  contracts"  or 
"contracts  as  a  percentage  of  the  contract  objectives"  as  the  best  measure  of 
recruiter  performance  because  it  is  "what  the  job  is  all  about".  While  many 
recruiters  communicated  a  concern  with  the  issue  of  "quality"  recruits,  only 
221  of  the  respondents  felt  certain  enough  about  a  recruiter's  responsibility 
and/or  ability  to  influence  recruit  quality,  to  choose  a  quality  indicator  as 
the  best  measure  of  recruiter  performance. 
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Table  4 


Percentage  of  Respondents 
Best  Measure 

Choosing  Each  Measure 
of  Performance 

as  the 

Measures  i 

Station 

Oonmanders 

Recruiters 

n=22 

n=42 

Overall  Ratings 

1 .  5-pt  scale 

5 

2 

1?.  7-pt  scale 

0 

0 

18.  Enlisted  Evaluation  Report 

5 

a 

Applicant  Processing 

2.  Contacted  for  at  least  20  min 

9 

14 

3.  Test 

5 

7 

Send  for  physical 

0 

2 

5.  Send  for  contracting 

9 

2 

Contracts  and  Mission  Objective 

6.  Contracts 

18 

24 

7.  *  of  objective 

9 

17 

Delayed  Entry  Program  (DEP) 

8/9.  Number  of  DEP  losses 

0 

2 

Objective  Quality  of  Enlistees 

10.  High  School  Diploma  Graduates 

9 

0 

11.  AFQT  I  thru  Ilia 

0 

0 

12.  HSDG  and  AFQT  I  thru  Ilia 

0 

2 

Subjective  Quality  of  Enlistees 

13.  Quality  service  for  term  of 

5 

10 

enlistment 

14.  They  are  right  for  Army 

0 

14 

15.  Army  right  for  them 

0 

0 

16.  Become  quality  NCOs 

0 

0 

9o* 

More  than  one  response 

27 

2 

101* 

98* 

Note:  Wording  of  measures  for  commander  is  in  Appendix  2. 
a7his  question  was  not  on  the  recruiter  self  report 


Discussion 


Before  examining  the  views  of  station  commanders  and  field  recruiters  on 
new  recruiter  performance  measures,  it  is  best  to  examine  their  understanding 
and  degree  of  acceptance  of  current  performance  criteria. 

Role  Ambiguity:  Understanding  of  Current  Ratings 

Role  ambiguity  is  concerned  with  the  degree  to  which  an  individual 
understands  what  is  required  on  the  job.  Recruiters  and  station  commanders 
seem  to  be  in  agreement  that  field  recruiter  performance  is  currently  rated 
primarily  on  monthly  contract  production.  However,  role  ambiguity  does  seem 
to  exist  with  respect  to  what  extent  current  performance  criteria  are  based  on 
total  contracts  only  or  on  achieving  the  mission  box  objectives.  We  cannot 
currently  tell  whether  this  ambiguity  is  due  to  individual  recruiter  or 
station  commander  lack  of  awareness  or  interest  or  to  some  recent  shift  in 
Comnand  emphasis  that  has  not  been  clearly  communicated  to  the  recruiters  and 
station  commanders. 

Person-Role  Conflict;  Acceptance  of  Ratings 

While  many  respondents  stated  that  they  did  not  know  how  field  recruiter 
performance  could  be  rated  differently  from  the  way  it  is  currently  rated,  701 
of  the  respondents  did  express  a  preference  for  a  rating  method  which 
differed  from  their  perception  of  the  current  method.  Station  commanders  in 
particular  said  they  would  like  to  see  less  use  of  production  figures  and  more 
use  of  the  criteria  used  in  other  Army  assignments  (e.g.  supervisor  ratings  of 
the  total  person  or  soldier).  Recruiters  sought  more  recognition  of  their 
efforts  and  skills  in  working  with  applicants.  These  results  indicate  a  role 
conflict  among  recruiters  between  their  identity  as  'J.S.  A '•my  soldiers  and  the 
performance  criteria  placed  on  them  as  U.S.  Army  recruiters. 

A  different  sort  of  person-role  conflict  was  expressed  by  the  17%  of  the 
respondents  who  choose  a  subjective  enlistee  quality  measure  as  the  best 
measure  of  recruiter  performance.  Many  recruiters  express  a  conflict  between 
having  to  make  total  contract  production  numbers  and  their  personal  desire  to 
enlist  only  individuals  whom  they  feel  are  right  for  the  Army. 

Importance  of  Role  Ambiguity  and  Conflict 

The  amount  of  "ole  ambiguity  and  role  conflict  which  we  found  warrants 
further  efforts  to  understand  the  organizational  environment  of  the  field 
recruiter . 

Indications  of  the  importance  of  role  conflict  and  role  ambiguity  in 
decreased  organizational  effectiveness  are  to  be  found  in  the  general 
literature.  Keller  (1975)  found  employee  dissatisfaction  to  increase  with 
role  ambiguity.  Rizzo,  House,  &  Lirtzman  (1970)  suggested  that  role  conflict 
and  ambiguity  resulted  in  stress,  and  that  this  stress,  in  turn,  resulted  in 
dissatisfaction,  poor  performance  and  generally  inappropriate  organizational 
behavior.  Schuler,  Aldag,  i  Brief  (1977)  also  found  role  conflict  and 
ambiguity  to  be  related  to  negative  affective  states  such  as  dissatisfaction 
and  stress. 

All  the  recruiter's  problems  cannot  be  solved  by  eliminating  role 


ambiguity  or  car.ni't.  For  example,  establishing  clear  performance  criteria 


based  only  on  the  single  criterion  of  production  numbers  could  push  the 
recruiter  toward  malpractice.  Such  pressure  might  be  reduced  by  placing  more 
emphasis  on  performance  criteria  associated  with  being  an  honest, 
ha^d-working  soldier. 

Shifting  Role  Emphasis  for  Recruiters  and  Acceptance  of  New  Measures 

Much  of  the  ambiguity  in  respondents  perceptions  of  current  performance 
ratings  may  be  related  to  a  newly  emerging  role  for  recruiters.  Recent 
command  emphasis  on  the  quality  of  enlistees  and  recruiter  "ownership"  of 
enlistees  seems  to  mark  the  beginning  of  a  shift  from  the  role  of  the  Army 
recruiter  as  a  seller  to  anyone  who  is  willing  to  buy,  to  a  role  as  a 
personnel  recruiter  seeking  out  the  best  applicants  for  the  jobs 
the  Army  needs  to  fill.  This  new  emphasis  is  seen  by  the  respondents 
to  conflict  with  the  long  standing  criteria  of  total  contract  numbers. 
Recruiters  report  many  conflicting  demands  being  placed  on  them  concerning  the 
number  and  quality  of  applicants  they  should  be  seeking. 

Development  of  Performance  Criteria:  Research  Needs 

Researchers  in  this  area  need  to  be  aware  of  the  current  flux  in 
recruiting  performance  criteria  if  they  are  to  know  the  limits  on  their  work. 
n  recruiter  who  may  be  extremely  successful  in  accomplishing  the  performance 
criteria  as  he  or  she  perceives  them  may  not  be  considered  successful  in 
accomplishing  the  criteria  perceived  by  the  researcher.  Because  of  the 
emergence  of  recruit  quality  as  an  essential  component  of  recruiter 
performance  evaluation,  research  is  needed  on  the  effect  of  geopolitical  and 
socioeconomic  variables  not  only  on  total  proauctivity  but  also  on  categories 
of  producti vity . 

In  the  current  research,  we  found  that  fertility  adjustments  are 
acceptable  to  most  recruiters  and  station  commanders  if  the  adjustments  are 
sufficiently  explained  and  are  done  at  a  small  enough  level.  These 
adjustments,  however,  would  almost  certainly  be  better  done  as  adjusts ents  to 
the  standards  of  mission  box  objectives,  rather  than  as  adjustments  to 
contract  performance  outcomes.  While  most  recruiters  and  station  commanders 
can  accept  the  logic  for  area  fertility  adjustments  in  judging  performance, 
they  were  less  than  enthusiastic  about  subtracting  a  DRC  correction  number 
from  the  number  of  contracts  produced.  Any  system  which  reduces  the  number  of 
contracts  a  recruiter  is  credited  with,  would  be  perceived  as  unfair  and  would 
probably  lead  to  decreased  motivation. 
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THE  EFFECTS  OF  JOB  SATISFACTION  ON 
AIR  FORCE  ENLISTEE  RETENTION 

Kenn  Finstuen,  Ph.D.  and  Charles  N.  Weaver,  Ph.D. 

Manpower  and  Personnel  Division 
Air  Force  Human  Resources  Laboratory  (AFSC) 

Brooks  AFB,  Texas  78235 

\ 

\ 

\  Abstract 

The  purpose  of  this  research  was  to  assess  the  concurrent  and  predictive 
validity  of  occupational  attitudes,  as  measured  by  the  Air  Force  Occupational 
Attitude  Inventory  (OAI),  in  relation  to  global  job  satisfaction,  reenlistment 
intent,  and  actual  reenlistment  behavior  of  first-term  enlisted  airmen.  The 
OAI  was  administered  to  two  samples  of  airmen  consisting  of  1,217  personnel  in 
1973  and  4,784  personnel  in  1975. j  Multiple  linear  regression  equations  were 
developed  for  each  of  the  two  years  for  samples  based  on  reenlistment 
eligibility  (from  which  two  groups  were  formed:  eligible  only  and 
eligible-ineligible  combined)  and  on  separation  classifications  (from  which 
two  more  groups  were  formed:  voluntary  only  and  voluntary-involuntary 
combined).  -I  When  considered  as  a  set,  the  OAI  items  were  found  to  bear  a 
strong  relationship  to  global  job  satisfaction,  a  somewhat  lesser  relationship 
to  reenlistment  intent,  and  a  moderate,  but  highly  consistent  relationship  to 
reenlistment  behavior.  These  relationships  maintained  their  significance  when 
the  baseline  effects  of  53  biographical  and  job-related  variables  were  held 
constant  in  regression  analyses.  Findings  indicated  that  global  job 
satisfaction  was  associated  with  the  following:  job  challenge,  the  use  of 
airman  abilities,  and  feelings  of  accomplishment.  Reenlistment  intent  and 
actual  reenlistment  were  most  highly  associated  with  satisfaction  with  pay  and 
benefits  as  compared  to  civilian  jobs,  the  consideration  the  Air  Force  gave 
enlistees,  removal  of  irritants,  and  contributions  to  the  national  defense. 
Cross-validation  of  the  1973  and  1975  equations  revealed  that  these 
relationships  were  stable  through  time. f 

I.  THE  AIR  FORCE  JOB  SATISFACTION  RESEARCH  PROJECT 

Since  1971,  a  comprehensive  program  of  job  satisfaction  research  has  been 
conducted  by  the  Manpower  and  Personnel  Division  of  the  Air  Force  Human 
Resources  Laboratory.  The  objective  of  the  program  was  to  investigate  the 
impact  of  work-related  factors  on  job  satisfaction  and  career  decisions  as  a 
step  toward  reaching  the  goal  of  full  utilization  and  retention  of  qualified 
personnel.  The  basic  elements  of  this  program  were  to  (a)  define  and  measure 
the  dimensions  of  job  satisfaction,  (b)  identify  problem  areas  which  had  the 
greatest  potential  for  improvement  through  job  satisfaction  research,  and  (c) 
assess  the  effects  of  job  changes  on  job  satisfaction  attitudes  and 
reenlistment  decisions  (Gould,  1976,  p.5). 

The  first  phase  of  the  job  satisfaction  research  project  required  that  an 
inventory  be  developed  to  assess  the  dimensions  of  job  satisfaction  in  the 
work  environment  of  the  Air  Force  (Tuttle  A  Hazel,  1974).  In  developing  the 
inventory,  Tuttle,  Gould,  and  Hazel  (1975)  hypothesized  relevant  job 
satisfaction  dimensions  and  produced  a  scale  for  measuring  those  dimensions. 
Gould  (1978)  validated  the  hypothesized  dimensions,  examined  the  rating  scale, 
and  reduced  the  item  pool  to  a  minimum  number  required  to  assess  the  job 
attitude  domain  of  the  Air  Force  work  environment.  The  resulting  inventory, 
the  United  States  Air  Force  Occupational  Attitude  Inventory  (OAI),  is  composed 
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of  three  sections.  auction  I,  General  Information,  consists  of  51  items 
concerning  Air  Force  members'  biographical  background  and  job-relatea 
information,  and  attitudes  toward  reenlistment,  global  job  satisfaction,  and 
job  interest.  Section  II,  Occupational  Attitude  Information,  consists  of  200 
job  satisfaction  items,  the  last  10  of  which  apply  to  supervisory  work  and  are 
completed  only  by  airmen  who  supervise  others  as  part  of  their  job.  The 
satisfaction  attitudes  of  respondents  are  measured  on  a  9-point  rating  scale 
ranging  from  l=extremely  dissatisfied  to  9=extiemely  satisfied.  Section  III, 
Importance  of  Job  Aspects  to  Career  Decisions,  contains  35  items  representing 
the  dimensions  initially  hypothesized  in  the  development  of  the  inventory. 
The  factors  are  also  rated  on  a  9-point  scale,  ranging  from  l=not  important  to 
9=extremely  important. 

The  OAI  has  been  under  development  and  refinement  at  various  periods  for 
over  8  years,  and  represents  one  of  the  most  comprehensive  and  carefully 
researched  job  satisfaction  measures  of  those  commonly  in  use  (Pritchard  A 
Shaw,  1978).  Since  its  development,  the  OAI  has  been  used  in  a  number  of  job 
satisfaction  studies.  Gould  (1976)  reviewed  OAI-related  research  through 
September  of  1976  and,  since  then,  OAI-related  research  has  included 
examinations  of  first-term  and  careerist  attitude  differences  (Edwards,  1978) 
and  differences  among  work  roles  (Finstuen,  1981). 

II.  PURPOSE  AND  HYPOTHESES 

The  purpose  of  the  present  study  was  to  provide  knowledge  of  the 

concurrent  validity  of  the  OAI  against  global  job  satisfaction  and 

reenlistment  intent,  and  to  assess  the  predictive  validity  of  the  OAI  against 
actual  reenlistment.  Four  hypotheses  were  proposed: 

HI:  Global  job  satisfaction,  reenlistment  intent,  and  actual  reenlistment 
rates  for  first-term  airmen  will  vary  as  a  function  of  biographical 
attributes,  job-related  information,  and  occupational  affect  as 
measured  by  the  OAI. 

H2 :  Functional  relationships  between  the  OAI  and  the  attitudinal  and 
behavioral  criteria  will  be  found  to  exist  even  when  the  effects  due 
to  biographical  and  job-related  differences  are  controlled  for  or 
held  constant  in  prediction. 

H3:  The  attitudinal  saliency  of  specific  OAI  items  displaying  the  highest 

degree  of  association  with  glohal  job  satisfaction,  reenlistment 
intent,  and  actual  reenlistment  will  remain  stable  across  time. 

H4 :  Cross-validation  of  occupational  attitude  equations  developed  for 

samples  in  separate  years  will  result  in  consistent  and  significant 
predictions  of  attitudes  and  reeniistment  behavior  across  time. 

III.  METHOD 

Subjec  ts 

An  opportunity  to  examine  the  concurrent  and  predictive  validity  of  the 
OAI  was  made  available  when  the  instrument  was  administered  in  1973  and  1975 
to  random  samples  of  enlisted  Air  Force  personnel.  Excluding  career  airmen, 
the  samples  included  1,217  and  4,784  first-term  Air  Force  enlistees 
respectively  for  1973  and  1975,  for  whom  complete  data  were  available. 

Criterion  Data  Sets 

Clobal  job  satisfaction,  reenlistment  intent,  and  cctual  reenl is tment 
behavior  were  used  as  criteria.  Global  job  satisfaction  was  assessed  with  the 
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question,  "In  general,  how  satisfied  are  you  with  your  present  job?" 
Responses  were  made  on  an  8-point  rating  scale  ranging  from  l=extremely 
dissatisfied  to  8=extremely  satisfied.  Reenl istment  intent  was  measured  by 
responses  to  the  question  "Do  you  plan  to  reenlist  at  the  end  of  your  current 
enlistment?"  assessed  on  a  4-point  rating  scale  ranging  from  l=definitely  will 
not  reenlist  to  4-def in i te  ly  will  reenlist.  And,  of  course,  reenlistment 
behavior  was  determined  on  the  basis  of  whether  an  airman  actually  reenlisted 
or  not. 

While  concurrent  measures  for  both  global  job  satisfaction  and 
reenlistment  intent  were  included  in  the  1973  and  1975  OAI  surveys,  it  was  not 
possible  to  complete  the  validation  until  the  reenlistment  behavior  criteria 
had  matured.  In  other  words,  sufficient  time  had  to  pass  so  that  a 
reenlistment  decision  point  was  reached  by  all  airmen  in  the  samples* 
Furthermore,  analyses  were  performed  separately  for  airmen  who  were  eligible 
to  reenlist.  This  required  that  airmen  who  entered  the  service  in  1975  be 
tracked  for  36  months  to  the  point  at  which  qualitative  screening  for 
reenlistment  eligibility  could  take  place. 

Reenlistment  is  one  of  three  types  of  personnel  actions  which  occur  in  the 
course  of  an  airman's  tour  of  duty.  The  other  two  types  of  actions  are  losses 
and  extensions.  Each  event  of  these  three  broad  categories  is  assigned  one  of 
over  200  three-digit  Special  Program  Designator  (SPD)  identification  codes. 
For  the  purpose  of  creating  meaningful  criteria,  the  SPD  codes  were  classified 
on  the  basis  of  the  type  of  discharge  into  voluntary  and  involuntary 
categories.  A  voluntary  less  was  defined  as  a  separation  initiated  by  the  Air 
Force  member.  Examples  of  reasons  for  voluntary  separations  were  to  attend  an 
educational  facility,  to  accept  public  office,  and  to  join  a  civilian  police 
force.  An  involuntary  less  was  defined  as  a  separation  initiated  by  the  Air 
Force.  Examples  of  reasons  for  involuntary  separations  are  shirking,  sexual 
perversion,  misconduct,  and  permanent  physical  disability. 

Beyond  division  on  the  basis  of  voluntary-involuntary  separation,  the 
criteria  were  further  classified  on  the  basis  of  formal  reenlistment 

eligibility.  Eight  attitudinal  and  reenlistment  data  sets  were  developed. 
For  global  job  satisfaction  and  reenlistment  intent.  there  were  two 
categories,  each  based  on  eligibility.  These  are  shown  as  criteria  1-4 
below.  For  actual  reenlistment,  there  were  two  categories  of 

vol  untary- involuntary  separations  for  each  of  the  two  classifications  of 
reenlistment  eligibility.  These  criteria,  5-8  below,  were  dichotomous  1)  coded 
1  if  airmen  reenlisted,  and  zero  otherwise.  The  criterion  data  sets  were: 

1.  Clobal  job  satisfaction:  Eligible  -  Ineligible 

2.  Global  job  satisfaction:  Eligible  Only 

3.  Reenlistment  Intent:  Eligible  -  Ineligible 

4.  Reenlistment  Intent:  Eligible  Only 

5.  Actual  Reenlistment’  Eligible  -  Ineligible,  Voluntary  and  Involuntary 

6.  Actual  Reenlistment:  Eligible  -  Ineligible,  Voluntary  Only 

7.  Actual  Reenlistment:  Eligible  Only,  Voluntary  -  Involuntary 

8.  Actual  Reenlistment:  Eligible  Only,  Voluntary  Only 

Figure  1  displays  the  combinations  of  outcomes,  discharge  types,  and 
eligibility  classifications  used  for  defining  criterion  data  sets  5-8. 
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Fig.  1.  Three-dimensional  data  structures  for  making  predictions 
of  recnlistment  using  various  separation  classifications. 

The  four  criterion  data  sets  portray  various  separation  and  reenlist - 
ment  outcomes .  Separations  arc  classified  by  formal  eligibility  ,  either 
eligible  to  reenlist  or  ineligible,  and  by  discharge  type,  either  volun¬ 
tary  oi  involuntary.  Those  airmen  that  rcenlist  must  be  eligible. 
Obviottsiv  the  t^'po  el  disci. arge  dimeu.s ion  uocs  not  apply  to  rccnlistces. 
Botii  the  1973  and  19 “5  samples  were  coded  ns  shown  above. 


Predictors  anc  Equations 


Two  sets  of  predictors  were  included  in  the  analyses:  189  non-supervisory 
items  from  Section  II  of  the  OAI  and  53  pre-  and  post-enlistment  baseline 
measures,  including  25  biographical  and  28  job-related  control  variables  that 
are  typically  used  in  recruit  selection,  classification,  and  assignment 
actions.  The  biographical  variables  were  Airman  Qualifying  Examination  (AQE) 
aptitude  scores,  race,  sex,  age,  education,  marital  status,  number  of 
dependents,  size  of  hometown,  and  the  amount  of  time  spent  reading. 
Job-related  variables  were  months  of  total  active  federal  military  service 
(TAFMS),  months  on  the  job,  number  of  subordinates,  grade,  and  18  occupational 
membership  categories.  Squared  terms  for  aptitude,  age,  dependents,  and 
number  of  people  supervised  were  generated  to  detect  curvilinear  relationships 
when  conducting  the  regression  analyses. 

Figure  2  portrays  Che  functional  relationships  between  the  dependent  and 
independent  variables.  Three  kinds  of  multiple  linear  regression  equations 
were  developed  (Bottenberg  6  Ward,  1963;  Ward  &  Jennings,  1973).  The  first, 
or  full  model,  equations  were  composed  of  pre-  and  post-enlistment  control 
variables  and  the  189  OAI  items.  The  second,  or  restricted  model,  equations 
were  limited  to  the  biographical  and  job-related  variables.  The  third  set  cf 
equations  ,  also  restricted,  were  based  upon  the  OAI  items  alone. 

RESULTS 

Table  1  presents  descriptive  statistics  for  the  1973  and  1975  samples 
based  on  the  various  criterion  data  set  classifications.  Both  attitudes  and 
reenlistment  rates  were  somewhat  higher  for  the  eligible  only 
classifications.  With  the  exception  of  the  eligible  only-voluntary- 
involuntary  data  sets,  attitudes  and  reenlistment  rates  appeared  similar  for 
both  1973  and  1975.  Overall,  some  60  to  7QX  of  the  enlisted  airmen  did  not 
reenl ist . 


Table  1 

Criteria  Development  -  Means  and  Standard  Deviations  for 
Global  Job  Satisfaction,  Reenlistment  Intent,  and  Actual  Reenlistment  Rates 


Criteria 

Year 

Eligible 
N  Mean 

:  Only 

S.D. 

Eligible  -  Ineligible 
N  Mean  S.D. 

Attitudes 

Global  Job 

1973 

961  4.75 

2.10 

1,217  4.65  2.14 

Satisfaction 

1975 

3,753  4.82 

2.11 

4,784  4.69  2.15 

Reenlistment 

1973 

961  1.95 

.84 

1,217  1.91  .84 

Intent 

1975 

3,753  2.29 

.99 

4,784  2.22  .99 

ReeAl istment  Behavior  (Percent  Retained) 

N 

X 

N  X 

Voluntary  - 

1973 

896 

33.19 

1,131  29.00 

Involuntary 

1975 

2,993 

40.83 

4,017  30.92 

Voluntary  Only 

1973 

835 

38.20 

96  F  33.88 

1975 

2,988 

40.90 

3,650  34.03 

Note .  Global  job  satisfacton  was  scaled  l=extremely  dissatisfied  to 
8=extremely  satisfied.  Reenlistnent  intent  was  scaled  Indefinitely  will  not 
reenlist  to  4*=definitely  will  reenlist. 


Fig.  2.  Schematic  diagram  of  dependent  and  independent  variables  used  in  the  study. 


Table  2  presents  the  multiple  correlation  results  for  all  criterion  data 
sets.  In  support  of  hypothesis  1,  the  statistically  significant  results  for 
the  full  models  (A)  indicated  that  all  criteria  varied  as  a  function  of 
biographical ,  job-related,  and  occupational  attitude  variables.  To  determine 
the  effects  of  the  OAI  item  set,  the  multiple  squared  correlation  coefficients 

Table  2 

R'gression  Analysis  and  Validation  Sunanary 
for  Global  Job  Satisfaction,  Reenl ist-nent  Intent,  and  Retention 


Full 

Criterion  Models  (A) 

R2 

Restr ic  ted 
Models  (B) 

[OAI  Removed] 
R2 

Mi 

(A)  vs  (B) 
df  2 

Fa 

Restricted 
Mode  Is  ( G ) 
[OAI  Only] 
R2 

Global  Job  Satisfaction 

1973 

Survey 

Eligible-Ineligible 

.71 

.  13 

189 

979 

10.02* 

.67 

Eligible  Only 

.76 

.  17 

189 

723 

9.38* 

.  74 

Reenlistment  Intent 

Eligible-Ineligible 

.46 

.  16 

189 

979 

2.86* 

.41 

Eligible  Only 

Retention 

El  i gible-T.nel  igible 

.51 

.19 

189 

723 

2.53* 

.45 

Voluntary- In voluntary 

.34 

.  15 

189 

893 

i.36* 

.24 

Voluntary  Only 
Eligible  Only 

.37 

.14 

189 

730 

1.37* 

.28 

Voluntary- Involuntary 

.39 

.16 

189 

723 

1.46* 

.30 

Voluntary  Only 

•  44 

.  17 

1975 

Survey 

189 

59  7 

1.56* 

.35 

Global  Job  Satisfaction 

El igible-Inel igible 

.  60 

.09 

189 

4,546 

31.21* 

.59 

Eligible  Only 

.61 

.08 

189 

3,515 

25.02* 

.59 

Reenlistment  Intent 

Eligible-Ineligible 

.33 

.12 

189 

4 , 546 

7.78* 

.28 

Eligible  Only 

Retention 

F,1  igible/ Ineligible 

.35 

.  12 

189 

3,515 

6.55* 

.30 

Voluntary- Involuntary 

.21 

.12 

189 

3,779 

2.33* 

.  13 

Voluntary  Only 
Eligible  Only 

.20 

.  10 

189 

3,412 

2.28* 

.  13 

Voluntary- In voluntary 

.21 

.  10 

189 

2,755 

2.15* 

.  16 

Voluntary  Only 

.22 

.10 

189 

2,750 

2.15* 

.  16 

aA  1 1  F  tests  comparing  full  (a1  and  restricted  (B) models  were  significant*  j>< 


.01. 

Not_e.  Full  models  (A)  contain  OAI,  biographical,  and  job-related  variables.  For 
restricted  models  (B)  the  OAI  items  have  been  removed  and  restricted  models  1C) 
contain  cnly  OAI  attitude  scores.  All  full  model  (Al  and  OAI  model  IG) 
coefficients  were  statistically  different  from  zero,  £<.01. 


Alb 


from  the  full  regression  models  were  tested  against  regression  results  from 
mod  Of  restricted  to  biographical  and  job-related  variables  (B).  As  shown  by 
the  F  test  results  in  Table  2,  the  removal  of  the  OAI  item  set  from  the 
regression  equations  was  s t a t is t ica 1 ly  significant  for  all  criterion  data 
sets — supporting  the  second  hypothesis  concerning  OAI  relationships  with  the 
criteria  in  the  presence  of  the  biographical  and  job-related  control 
variables-  These  results  were  interpreted  as  providing  substantive  evidence 
tor  the  linkage  between  the  OAI  and  global  job  sat i s £ ac t ion ,  reenlistment 

intent,  and  reenl i s tment  behavior-  The  last  column  of  Table  2  presents  the 
squared  multiple  correlations  for  the  OAI  equations  (C).  All  coefficients 

were  significantly  different  from  zero.  Prediction  of  reenlistment  behavior 
appeared  to  be  somewhat  greater  in  magnitude  for  eligible  airmen  than  for  the 
el lgihlc-inel igibile  c lass  if ic a t ion.  As  would  be  expected,  the  concurrent 
validations  of  the  OAT  against  attitudes  of  global  job  satisfaction  and 
reenlistment  intent  were  somewhat  higher  than  the  predictive  validations 
against  reenlistment  behavior,  both  for  1973  and  for  1975  survey  samples. 

Specific  Occupational  Attitude  Item  Contributions 

The  third  hypothesis  proposed  that  the  attitudinal  saliency  of  specific 

OAI  items  would  be  similar  across  years  for  the  separate  criterion  measures. 
The  1S9  OAI  items  were  consecutively  entered  into  multiple  regression 
equations  using  a  stepwise  technique.  Results  from  the  final  stepwise 
equations  were  examined  to  determine  the  relative  predictive  efficacy  of 

individual  items.  The  zero  order  correlations  and  the  order  of  entry  fer  the 
five  mi  d  significantl  predictive  items  associated  with  the  criteria  are 
presented  in  Table  3-  Since  prior  findings  indicate  that  the  eligible  only 
classifications  produce  higher  correlations,  the  results  are  limited  to  the 
eligible  only  data  sets. 

Table  3 

Criterion  Correlations  and  Order  of  Entry  in  Stepwise 
Regressions  for  the  Top  Five  OAI  Items 


Spec  if ic 

Job  Satisfaction 
Item  from  OAI 


Challenge  provided  by  job 
Way  job  uses  abilities 
Accomplishment  feelings 
Amount  of  interesting  work 
Supv  brings  out  best 
P.ice  of  your  work 


Global  Job 
Satisfact ion 
1973  1975 

.  73 (lT 


.73(2) 
.  71(3) 


.65(1) 

-63(2) 

-63(4) 

.66(3) 


Reenlistment 

Intent 

IS  7 3  1975 


Retention 
1973  1975 


.41147 

.4915) 


>5(5) 


ray  compared  with  outside 

.3515) 

.35(1) 

.20(4) 

.22(1) 

Consideration  given  by  A.F. 

.38(1) 

.33(2) 

.23(2) 

.20(2) 

Benefits  compared  to  outside 

Social  position  in  A.F. 

.3812) 

.33(3) 

.24(1) 

.200) 

A.F.  removes  irritants 

.360) 

.31(5) 

Contribution  to  nat  '  1  defense 
Information  on  promotions 

.32(4) 

. 26(a) 

-.020) 

Recreation  in  Community- 
Weighted  Amn  Promotion  System 
Fduca t ional  oppor t un i t i es 


> a r a  sets  ( e ligible  only)  N= 


561 


.2014) 
.  16(5) 


961 


3,753 


3,753 


.  14(5) 
835 


2,988 


Three  major  inferences  may  be  drawn  from  an  inspection  of  the 
re  l  at  101  .;h  ip„  displayed  in  Table  3.  First,  global  job  satisfaction  appears  to 
be  associated  with  a  different  domain  of  specific  occupational  attitudes  than 
reenlistment  intent  and  behavior.  Challenge,  use  of  abilities,  accomplishment 
feelings,  and  the  pace  of  the  work  are  common  to  both  1^73  and  1975- 

Secondly,  reenlistment  intent  and  actual  reenlistment  behavior  appear  to  be 

aligned  on  three  items  across  all  years,  via.,  pay  and  benefits,  compared  to 

civilian  jobs,  and  the  consideration  given  airmen  by  the  Air  Force.  Two  other 
items  align  with  reenlistment  intent  across  both  years  were  the  removal  of 
irritants  and  contributions  to  the  national  defense.  Airmen  indicating  low 
attitude  scores  on  these  items  are  more  likely  to  separate  than  airmen 

indicating  that  they  ere  satisfied  with  these  issues.  Finally,  actual 

reenlistment  behavior  also  appears  to  be  influenced  by  social  position  and 

educational  opportunities  in  1973,  shifting  toward  recreation  and  promotion 
concerns  in  1975. 

Cross-validation  of  the  OAI  Equations 

The  final  phase  of  this  research  project  examined  hypothesis  d  concerning 
the  consistency  of  the  regression  equations.  Dual  cross-validat ions  were 
conducted  upon  all  criterion  data  sets  by  the  application  of  the  1973  least 
squares  regression  weights  to  the  1975  data  sets,  and  vice  versa.  The 
resulting  coefficients  were  then  assessed  against  a  correlation  of  zero  to 

determine  if  the  weights  associated  with  the  specific  regression  equations 
were  stable  enough  to  produce  an  acceptable  level  of  prediction  in  another 

sample.  Results  from  all  F  tests  were  significant  (|><.0l),  and  indicated 
that  the  specific  occupational  attitude  effects  were  consistent  across  time, 
and  that  the  multiple  relationships  observed  in  the  development  samples  were 
not  entirely  attributable  to  the  capitalization  upon  specific  sample  variance, 
but  rather  were  indicative  of  stable  patterns  which  could  be  replicated  in 
other  samples. 

CONCLUSIONS 

The  accomplishment  of  the  validation  of  the  OAI  with  respect  to  global  job 
satisfaction,  reenl istment  intent,  and  reenlistment  behavior  resulted  in  the 
following  conclusions. 

1.  The  OAI  possessed  concurrent  validity  against  expressions  of  global 
job  satisfaction  and  reenlistment  intent  and  predictive  validity  with  respect 
to  actual  reenlistment  behavior-  This  validity  was  demonstrated  for  the  OAI 
with  and  without  the  consideration  of  baseline  effects  due  to  number  of 
biographical  and  job-related  factors.  Cross-val idat ion  results  further 
indicated  that  these  findings  were  general  liable  across  time. 

2.  The  major  areas  of  OAI  attitudes  identified  as  having  important 
influences  on  global  job  satisfaction  were:  the  challenge  provided  by  one's 
job,  the  way  the  job  uses  one's  abilities,  the  amount  of  interesting  work  one 
does,  the  feelings  of  accomplishment  one  gets  from  the  work,  and  the  pace  of 
work. 


3.  The  major  areas  of  OAI  attitudes  identified  as  having  important 
influences  on  reenlistment  intentions  were:  pay  compared  with  a  civilian  job, 
consideration  given  by  the  Air  Force,  fringe  benefits  compared  with  a  civilian 
job,  Air  Force  efforts  to  remove  irritants,  and  contributions  to  the  national 
defense. 


4.  The  major  areas  of  OAI  attitudes  identified  as  having  important 
influences  on  actual  reenlistment  rates  were:  pay  compared  with  a  civilian 
job,  consideration  provided  by  the  Air  Force,  fringe  benefits  compared  with  a 
civilian  job,  social  position,  promotion  concerns,  and  recreational  and 
educational  opportunities. 

5.  While  the  OAI  was  successfully  validated  against  global  job 
satisfaction,  reenlistment  intent,  and  actual  reenlistment  behavior,  it  was 
found  that  the  most  highly  related  CAI  items  were  different  for  global  job 
satisfaction  as  compared  with  reenlistment  intent  and  reenlistment  behavior. 
Thus,  is  should  be  understood  that  enhancement  of  global  job  satisfaction  may 
not  necessarily  bring  about  improvements  in  reenlistraent  intent  and  actual 
reenlistment.  Likewise,  successful  efforts  in  relation  to  increasing 
reenlistment  may  not  impact  global  job  satisfaction  concerns  of  first-term 
enlis.'J  airmen. 

6.  The  findings  of  this  study  emphasize  that  post-enlistment 
occupational  attitudes  are  important  in  addition  to  selection,  classification, 
and  assignment  concerns  related  to  global  job  satisfaction  and  reenlistment 
intent.  More  importantly,  specific  areas  were  shown  to  influence  actual 
reenlistment  rates.  Efforts  to  enhance  retention  and  limit  separations  in  the 
first  term  could  be  realized  by  integrating  these  specific  issues  with 
curriculum  materials  for  the  career  advisor,  NCO  Academy,  and  enlisted 
supervisor  and  management  training  programs. 
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Differential  Effectiveness  and  Efficiency  of  Individualized  Instruc¬ 
tion:  II.  Major  Findings 


This  paper  presents  the  major  findings  of  the  TAEG  differential 
effectiveness  and  efficiency  study.  Multiple  regression  analyses 
indicated  significant  differences  in  fleet  supervisor  ratings  for 
graduates  of  individualized  vtrsus  conventional  Instruction.  These 
differences  were  related  to  different  kinds  of  training  tasks,  but  not 
to  ability  levels  of  graduates.  Significant  interactions  between 
method  of  instruction  and  type  of  task,  and  method  of  Instruction  and 
ability  level  were  found  with  respect  to  school  achievement  (for  both 
course  completion  times  and  final  course  grades).  The  findings  are 
discussed  in  terms  of  their  utility  for  instructional  design.  , 
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DIFFERENTIAL  EFFECTIVENESS  AND  EFFICIENCY  OF  INDIVIDUALIZED  INSTRUCTION: 

II.  MAJOR  FINDINGS 


Jon  S.  Freda,  Eugene  R.  Hall,  and  Larry  H.  Ford 

Training  Analysis  and  Evaluation  Group 
Orlando,  Florida  32813 


Individualized  instruction  has  become  a  controversial  issue  in  military 
training.  Many  individuals  in  both  training  and  operational  settings  have 
come  to  believe  that  individualized  instruction  (II)  is  not  a  desirable  or 
effective  way  to  train  students  for  operational  job  assignments.  The 
widespread  belief  is  that  conventional  classroom  group-paced  (GP)  methods 
result  in  better  trained  personnel. 

Currently,  the  U.S.  Navy  conducts  technical  training  under  both  II  and  GP 
instructional  methods.  The  principal  II  methods  used  are  computer  managed 
instruction  (CMI)  and  self-paced  (SP),  or  instructor  managed  instruction 
(IMI).  Because  of  the  potential  for  reduced  student  training  time,  the  Navy 
plans  to  individualize  still  more  of  its  courses.  However,  in  view  of 
concerns  expressed  by  fleet  units,  the  Chief  of  Naval  Education  and  Training 
(CNET)  tasked  the  Training  Analysis  and  Evaluation  Group  (TAEG)  to  conduct  a 
study  to  examine  the  effects  of  individualized  instruction. 

PURPOSE 

The  purpose  of  the  study  was  to  determine  if  individualized  instruction 
is  more  or  less  effective  and/or  efficient  than  conventional  instruction,  and 
further,  if  these  effects  differentially  relate  to: 

.  training  individuals  of  differing  ability  levels  and/or 
training  different  types  of  tasks. 

The  present  paper  presents  selected  major  findings  of  this  study.  The 
previous  paper  in  this  volume  (Hall  and  Freda)  presents  details  of  the 
methodology  employed  to  conduct  this  study. 

ANALYTICAL  STRATEGY 

A  partial  hierarchic,!  regression  model  was  employed  to  examine  the 
effects  of  each  set  of  predictors  on  the  criterion  variables  (Cohen  &  Cohen, 
1975;  Kim  4  Kohout,  1975).  This  model  allowed  a  unique  partitioning  of  the 
total  variance  of  each  criterion  to  be  accounted  for  by  each  subset  of 
predictors  entered  into  the  regression  equation.  The  use  of  a  multiple 
regression  approach  reflects  current  methodological  approaches  used  in 
investigating  aptitude-treatment  interactions  (Cronbach  &  Snow,  1977). 
Predictor  variables  were  considered  statistically  significant  and  relevant  if 
(1)  they  met  the  acceptable  level  of  significance  ( p < . 05 )  and  (2)  could 
explain  a  relevant  amount  of  variance  on  the  criterion  variable  (increment  the 
multiple  R^  by  at  least  2  percent). 


RESULTS 


The  results  are  presented  first  in  order  of  the  training  efficiency  and 
training  effectiveness  measures.  Then,  within  each  of  these  measures,  signifi 
cant  predictors  (both  main  effects  and  interactions)  are  delineated  for  each 
criterion  variable.  The  results  presented  below  are  based  on  a  data  analysis 
across  courses  (see  table  1).  Only  statistically  significant  results  are 
presented. 


Table  1. 

Original  Sample  Size  of  Graduates  by  Method  of  Instruction 


Method  of  Instruction 


N 


Self-Paced  (SP)  1487 
Computer  Managed  Instruction  (CMI)  823 
Individualized  (SP  +  CMI)  2310 
Conventional  (Group-Paced  (GP))  1696 

Total  4006 


Training  Efficiency  Measures 

Criterion:  Time  to  Complete  the  Course 

Method  of  Instruction.  II  (SP  +  CMI)  graduates  completed  their  courses  in  a 
shorter  period  of  time  than  Cl  (GP)  graduates.  CMI  graduates  completed  their 
courses  in  a  shorter  period  of  time  than  SP  graduates  (see  table  2). 


Table  2. 

Mean  Time  to  Complete  the  Course  (Contact  Hours) 


Method  of  Instruction 

Mean 

S.D.l 

S.E.M.2 

N 

SP 

166.35 

69.60 

1.94 

1283 

CMI 

139.57 

88.25 

3.11 

803 

II  (SP+CMI ) 

156.04 

82.57 

1.81 

2086 

Cl  (GP) 

195.22 

122.05 

2.95 

1707 

lS.D.  =  Standard  Deviation 
2S.E.M.  =  Standard  Error  of  the  Mean 
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Ability  Level.  In  general,  graduates  with  higher  AFQT  percentiles  finished 
their  courses  in  a  shorter  period  of  time  than  those  with  lower  AFQT  percentiles 
(see  table  3). 

Table  3. 

Mean  Time  to  Complete  the  Course  (Contact  Hours) 

— 


Percentile  Range 

Mental  Category 

Mean 

S.D. 

S.E.M. 

N 

93-99 

1 

134.64 

93.05 

6.53 

203 

65-92 

2 

178.19 

113.29 

3.39 

1114 

49-64 

31) 

174.11 

103.45 

2.63 

1551 

31-48 

3L 

172.12 

91.30 

3.23 

800 

21-30 

41) 

191.73 

83.40 

14.30 

34 

10-20 

4L 

480.00 

0.0 

0.0 

1 

Method  of  Instruction  by  Ability  Level.  II  graduates  in  the  upper  mental 
categories  finished  their  courses  in  a  shorter  period  of  time  than  Cl  graduates 
in  the  upper  mental  categories.  Both  II  (SP  +  CMI)  and  Cl  (GP)  graduates  in 
the  mid  and  lower  mental  categories  took  about  the  same  amount  of  time  to 
complete  their  courses  (see  figure  1). 


41) 

3L 

3U  2 

Mental  Category 

1 

21-30 

31-48 

49-64  65-92 

93-99 

AFQT  Percentile  Range 


Figure  1.  Mean  Time  to  Complete  the  Course  by  Method  of  Instruction 
and  Ability  Level  (i.e.,  Mental  Category/AFQT  Percentile 
Range).  Mean  Data  Points  Based  on  Less  Than  Five  Graduates 
Are  Not  Plotted. 
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Training  Task.  In  general,  more  fact  tastes  were  taught  in  courses 
associated  with  longer  completion  times  than  with  shorter  completion  times. 
More  category,  procedure,  and  rule  tasks  were  taught  in  courses  associated 
with  shorter  completion  times  than  with  longer  completion  times. 


Method  of  Instruction  by  Training  Task.  II  graduates  had  shorter 
completion  times  than  Cl  graduates  in  courses  that  taught  a  smaller  percentage 
of  fact,  category,  procedure,  and  rule  tasks.  Both  II  and  Cl  graduates  took 
about  the  same  amount  of  time  to  complete  courses  that  taught  a  greater 
percentage  of  these  tasks,  excluding  fact  tasks  (see  figure  2). 


Mean  Time 
to 

Complete 
the  Course 
(Hours) 


1-10  11-20  1-50  51-100 

Percentage  of  Task  Taught 

Figure  2.  Mean  Time  to  Complete  the  Course  by  Method  of  Instruction 
and  Type  of  Task.  There  Is  Only  One  Mean  Data  Point  for  II 
Fact  Task 


Criterion:  Training  Costs 

Training  costs  refer  to  the  total  training  costs  to  produce  one  graduate 
per  course  session.  Training  costs  differ  by  each  school,  and  all  the 
graduates  per  school  are  assigned  the  same  value  of  total  training  costs  for 
that  particular  school.  Thus,  total  training  costs  is  a  course  level  variable 
because  each  student  does  not  have  assigned  to  him/her  a  unique  training 
cost  .  Rather,  the  training  costs  are  assigned  on  a  school  (course)  level. 
Therefore,  training  costs  was  not  entered  into  the  multiple  regression  model 
as  a  criterion  variable  due  to  the  lack  of  variance  in  training  costs  within 
courses.  Regression  analyses  of  derived  individual  level  training  costs,  as 
well  as  aggregated  course  level  training  costs,  will  be  contained  in  reports 
currently  being  prepared  by  the  TAEG. 
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Training  Effectiveness  Measures 


Criterion:  End  of  Course  Grades 


Method  of  Instruction.  II  (SP  +  CMI)  graduates  received  higher  end  of 
course  grades  than  Cl  (GP)  graduates.  SP  graduates  received  higher  end  of 
course  grades  than  CMI  graduates  (see  table  4). 


Table  4. 

Mean  End  of  Course  Grade  by  Method  of  Instruction 


Method  of  Instruction 

Mean 

S.O. 

S.E.M. 

N 

SP 

90.84 

5.51 

.23 

584 

CMI 

85.74 

8.24 

.31 

692 

II  (SP+CMI) 

88.08 

7.56 

.21 

1276 

Cl  (GP) 

82.10 

6.72 

.24 

768 

Ability  Level, 
higher  end  of  course 

In  general,  graduates  with  higher  AFQT  percentiles  receivi 
grades  than  those  with  lower  AFQT  percentiles  (see  table  ! 

Mean  End  of  Course  Grade 

Table  5. 

by  AFQT  Percentile  Range  and  Mental  Category 

AFQT 

Percentile  Range 

Mental  Category 

Mean 

S.D.  S.E.M. 

N 

93-99 

1 

92.78 

6.51  .56 

135 

66-92 

2 

87.45 

7.63  .29 

673 

49-64 

3U 

84.55 

7.32  .27 

737 

31-48 

3L 

83.53 

7.63  .37 

436 

21-30 

4U 

81.87 

7.83  1.71 

21 

10-20 

4L 

79.50 

0.0  0.0 

1 

Method  of  Instruction  by  Ability  Level.  II  (SP  +  CMI)  graduates  with  higher 
AFQT  percentiles  received  higher  end  of  course  grades  than  Cl  (GP)  graduates 
with  higher  AFQT  percentiles.  Both  II  and  Cl  graduates  with  lower  AFQT 
percentiles  received  similar  end  of  course  grades.  Cl  graduates  with  higher 
AFQT  percentiles  received  similar  end  of  course  grades  as  Cl  graduates  with 
lower  AFQT  percentiles.  For  both  SP  and  CMI  graduates,  AFQT  was  positively 
related  to  end  of  course  grade.  There  were  no  significant  differences  between 
SP  and  CMI  graduates  on  end  of  course  grades  by  ability  level  (see  figure  3). 


4L 

4U 

3L 

Mental 

3U 

Category 

2 

1 

10-20 

21-30 

31-48 

49-64 

65-92 

93-99 

AFQT  Percentile  Range 

Figure  3.  Mean  End  of  Course  Grade  by  Method  of  Instruction  and 
Ability  Level  (Mental  Category /AFQT  Percentile  Range). 
Mean  Data  Points  With  Less  Than  Five  Graduates  Are 
Not  Plotted. 


Training  Task.  In  general,  graduates  received  higher  grades  in  courses  that 
taught  more  category,  procedure,  and  rule  tasks,  and  received  lower  grades  in 
courses  that  taught  more  fact  tasks. 

Method  of  Instruction  by  Training  Task.  II  graduates  received  higher  grades 
than  Cl  graduates  in  courses  that  taught  a  smaller  percentage  of  fact  and 
category  tasks.  This  difference  between  II  and  Cl  graduates  is  accentuated 
even  more  in  courses  that  teach  a  larger  percentage  of  category  tasks  (see 
figure  4) . 


1-10  11-20  1-50  51-100 


Percentage  of  Task  Taught 

Figure  4.  Mean  End  of  Course  Grade  by  Method  of  Instruction  and 

Type  of  Training  Task.  There  Is  Only  One  Mean  Data  Point 
for  II  Fact  Tasks. 
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Criterion:  Training  Appraisal  System  (TAS)  Ratings 

Method  of  Instruction.  II  (SP  +  CMI)  graduates  received  higher  TAS 
ratings  than  Cl  ( GP )  graduates.  There  were  no  significant  differences  between 
SP  and  CMI  graduates  on  TAS  ratings  (see  table  6). 

Table  6. 

Mean  TAS  Ratings  by  Method  of  Instruction 


Method  of  Instruction 

Mean 

S.D. 

S.E.M. 

N 

SP 

2.90 

.70 

.02 

899 

CMI 

2.96 

.57 

.03 

287 

II  (SP+CMI) 

2.91 

.67 

.02 

1186 

Cl  (GP) 

2.69 

.67 

.02 

1129 

Training  Task.  In  general,  graduates  who  attended  courses  that  taught  a 
smaller  percentage  (less  than  10  percent)  of  fact  and  category  tasks  received 
higher  (mean  =  2.90)  TAS  ratings  than  graduates  who  were  taught  a  greater 
percentage  (20-30  percent)  of  these  tasks  (mean  =■  2.70). 

DISCUSSION 

II  graduates  completed  their  courses  in  a  shorter  period  of  time  and 
received  higher  end  of  course  grades  and  training  adequacy  ratings  than  Cl 
graduates  in  this  study.  These  differences  between  II  ar.d  Cl  graduates  are 
also  related  to  different  kinds  of  training  tasks  and  ability  levels  of  the 
graduates.  These  results  are  discussed  in  terms  of  school  and  fleet 
performance  measures.  School  performance  criterion  measures  (end  of  course 
grades,  time  to  complete),  revealed  significant  two-way  interactions  between 
method  of  instruction,  and  ability  level  and  training  task.  In  terms  of 
training  efficiency,  higher  mental  category  II  graduates  completed  their 
courses  in  less  time  than  higher  mental  category  Cl  graduates.  There  were  no 
significant  differences  between  lower  mental  category  II  and  Cl  graduates  in 
terms  of  time  to  complete.  With  respect  to  training  effectiveness,  II 
graduates  with  higher  ability  levels  received  higher  end  of  course  grades  than 
Cl  graduates  with  higher  ability  levels.  Both  II  and  Cl  graduates  with  lower 
ability  levels  received  similar  end  of  course  grades. 

The  relationship  observed  for  the  II  graduates  agrees  with  the  generally 
reported  finding  of  general  ability  measures  predicting  learning  of  new 
material  (Cronbach  and  Snow,  1977).  However,  the  relationship  found  for  the 
Cl  graduates  requires  reflection  as  to  its  source.  It  is  possible  that  Cl 
courses  with  longer  durations  unintentionally  selected  higher  mental  category 
students  than  Cl  courses  with  shorter  duration.  This  selection  could  have 
come  about  by  the  ASVAB  composite  entrance  requirements  for  each  Cl  course 
being  related  to  AF'QT  percentiles. 
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An  alternative  explanation  is  that  Cl  provides  a  less  demanding  and/or 
less  controllable  environment  for  the  higher  ability  students.  It  has  been 
suggestd  that  learning  depends  on  general  intellectual  dvelopment  to  a  greater 
degree  when  active  intellectual  work  is  required  of  the  student.  Methods  of 
instruction  that  reduce  the  intellectual  demand  often  reduce  the  differences 
between  high  and  low  ability  students  (Cronbach  and  Snow,  1977).  If  these 
methods  are  applied  to  instruction  of  low  ability  students  over  a  long  period, 
many  low  ability  students  may  equal  or  excel  high  ability  students  in  terms  of 
their  mastery  of  lesson  content.  This  explanation  is  supported  by  the  facts 
that  (1)  Cl  graduates  cannot  control  the  length  of  stay  within  the  course,  (2) 
Cl  graduates  received  similar  end  of  course  grades  regardless  of  mental 
category,  and  (3)  Cl  graduates  obtained  lower  end  of  course  grades  when  taught 
a  greater  percentage  of  complex  training  tasks  than  II  graduates. 

For  fleet  performance  criterion  measures  (TAS  ratings),  ability  level  of 
the  graduates  was  not  related  to  the  training  adequacy  rating.  Overall, 
graduates  who  were  taught  a  smaller  percentage  of  fact  and  category  tasks 
received  higher  TAS  ratings  than  those  taught  a  greater  percentage  of  these 
tasks.  It  is  noted  that  this  latter  finding  is  similar  to  the  relationship  of 
fact  tasks  with  end  of  course  grades,  but  opposite  to  the  relationship  of 
category  tasks  with  end  of  course  grades.  Apparently,  percentage  of  fact 
tasks  taught  in  II  and  Cl  courses  acts  as  a  reliable  indicator  of  end  of 
course  grades  and  TAS  ratings;  whereas,  the  influence  of  category  tasks  may  be 
more  susceptible  to  interfering  factors  during  training  and/or  during  the  time 
interval  between  school  graduation  and  fleet  performance  rating. 

It  is  also  noted  that  the  sex  of  graduate  and  geographic  location  of  the 
school  were  not  significantly  related  to  school  or  fleet  performance  measures. 

Based  on  these  results,  it  is  suggested  that  higher  ability  students 
could  be  tracked  into  a  more  individualized  instructional  environment  and 
lower  ability  students  be  tracked  into  a  more  conventional  instructional 
environment.  This  suggestion  assumes  chat  an  experimental  setting  could  be 
designed  in  which  a  specific  course  would  have  conventional  (instructor-group 
oriented)  instruction  aspects  and  individualized  (self-pacing)  instruction 
aspects. 

It  is  also  suggested  that  the  "method  of  instruction"  distinction  could, 
then,  be  dropped  in  favor  of  a  training  task  classification.  This  training 
task  classification  could  be  used,  for  example,  to  select  incoming  students 
for  the  individualized  or  conventional  aspects  of  the  course  based  on  ability- 
task  entrance  scores. 


CONCLUSIONS 

Individualized  instruction  (II)  compared  favorably  to  conventional 
instruction  (Cl)  with  respect  to  the  school  and  fleet  performance  measures 
used  in  this  study.  Ability  level  of  graduates  and  type  of  training  task 
significantly  differentiated  the  school  performance  differences  between  II  and 
Cl,  but  did  not  interact  to  facilitate  explanation  of  the  differences  between 
II  and  Cl  on  fleet  performance  measures.  There  were  no  significant  school  or 
fleet  performance  differences  due  to  different  types  of  training  tasks 
presented  to  different  ability  levels  of  graduates.  The  advantage  of 
classifying  the  methods  of  instruction  into  different  types  of  training  tasks, 
then,  appears  to  be  one  of  clarifying  the  relative  amounts  of  different  kinds 
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of  information  taught  in  the  courses.  Also,  ability  level  of  graduates  was 
not  related  to  fleet  performance  measures.  Thus,  the  traditional  factors 
described  in  the  apt itr de/treatment  interaction  literature  (i.e.,  ability 
level,  training  task,  an.1  method  of  instruction)  were  shown  to  have  more 
predictive  power  (i.e.,  larger  number  of  significant  main  effects  and 
interactions)  for  school  performance  measures,  but  to  have  less  predictive 
power  for  fleet  performance  measures  in  this  study. 
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Applied  Science  Associates,  Inc.  has  developed  and  applied  a  CTEA 
methodology  that  can  be  used  with  developing,  as  well  as  already  fielded, 
equipment  systems.  The  methodology  is  grounded  in  a  thorough  front-end- 
analysis  of  the  job  and  training  requirements.  The  products  of  these 
analyses  feed  not  only  the  cost  and  effectiveness  trade-off  analysis  but 
provide  a  significant  portion  of  the  structure  and  content  for  actual  course 
design.  Trainability  assessment  and  training  alternative  selection  become 
less  significant  issues  when  compared  to  the  usefulness  of  the  products 
that  are  relevant  to  course  development.. 

Introduction 

The  El  Paso  Office  of  Applied  Science  Associates,  Inc.,  has  developed 
an  approach  to  the  conduct  of  a  Cost  and  Training  Effectiveness  Analysis 
that  emphasizes  a  thorough  analysis  of  a  job  and  its  tasks  to  specify  the 
functional  learning  requirements  of  the  training  system.  The  procedures 
were  developed  for  formalizing  the  collection,  analysis,  and  integration 
of  training  systems  cost,  impact  and  effectiveness  data.  The  primary  pur¬ 
pose  of  the  analysis  and  integration  of  training- related  data  concerns  the 
comparison  of  the  cost  and  effectiveness  of  alternative  training  programs 
for  meeting  pre-defined  performance  objectives.  The  secondary  purpose  of 
the  analysis  is  the  determination  of  the  trainability  of  the  job  performance 
requirements  under  the  specific  training  constraints  and  conditions. 

These  two  purposes,  cost  effectiveness  and  trainability,  are  common 
to  most  CTEA  methodologies.  ASA's  approach  to  fulfilling  these  purposes 
differs  somewhat  in  the  assessment  of  training  effectiveness,  and  the  de¬ 
termination  of  trainability.  The  major  procedural  differences  lie  in  the 
use  of  a  technique  for  quantifying  training  effectiveness,  and  in  the  use 
of  an  extended  program  of  instruction  (POI)  for  determining  trainability. 
These  techniques  will  be  briefly  covered  later,  but  first  the  primary  pur- 
post  of  this  paper  will  be  addressed — the  development  of  products  for  users 
other  than  the  traditional  user  of  CTEA  results. 

The  basis  of  the  ASA  approach  Is  a  human  factors  oriented  front  end 
analysis  (FEA)  thac  is  conducted  with  the  following  additional  purposes  in 
mind : 
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1. 

Training 

requirements 

2. 

Training 

system  design 

3. 

Training 

device  specification 

4. 

Training 

management 

5. 

Training 

course  design 

6. 

Training 

system  design  (and  career  development) 

7. 

Training 

performance  assessment 

8. 

Training 

support  plan  to  include  support  products 

Background 

These  issues  are  addressed  as  a  necessary  part  of  the  complete  CTEA 
as  carried  out  following  ASA  procedures.  ASA’s  CTEA  philosophy  is  that 
the  validity  and  completeness  of  the  analytical  process  is  dependent  upon 
the  FEA.  If  the  FEA  is  thorough,  the  resulting  CTEA  decisions  objectively 
fall  out  of  the  analysis  and  relatively  few  instructional  delivery  system 
decisions  have  tc  be  made.  Using  the  augmented  CTEA  procedures  for  two  Army 
systems  we  found  that  our  analytical  process  yielded  outputs  beyond  those 
usually  expected  from  a  CTEA.  The  procedures  emphasize  a  thorough,  detailed 
FEA  that  provides  data  and  information  of  use  by  other  than  the  usual  CTEA 
users,  such  as  course  designers,  course  developers,  the  training  device 
system,  career  development  planners,  performance  evaluation  developers, 
training  managers,  course  instructors  and  training  support  planners. 

CTEA  requirements  were  instituted  with  the  introduction  of  the  Life 
Cycle  System  Management  Model  which  guides  and  controls  the  conceptualiza¬ 
tion,  development  and  deployment  of  major  materiel  systems.  As  initially 
conceptualized,  the  CTEA  procedures  addressed  the  two  major  purposes.  First, 
the  analysis  concerned  the  comparison  of  the  cost  and  effectiveness  of  al¬ 
ternative  training  programs  so  that  a  program  could  be  selected  for  meeting 
defined  performance  objectives.  ASA’s  CTEA,  as  do  most  other  approaches, 
assumed  that  the  performance  objectives,  or  at  least  task,  statements,  would 
have  been  previously  prepared  and  available  for  analysis.  This  assumption 
proved  to  be  not  entirely  correct  and  job  analysis  procedures  had  to  be 
added  to  the  CTEA  process.  The  second  concern  was  with  the  determination 
of  the  trainability  of  the  job  performance  requirements  under  the  specific 
training  conditions.  It  became  obvious  as  we  got  into  the  two  CTEA  projects 
that  issues  in  addition  to  the  training  conditions  impacted  trainability  in 
a  way  so  as  to  constrain  the  degrees  of  freedom  in  program  design  and  de¬ 
livery  options.  As  a  result,  a  procedure  was  adopted  to  identify  the  con¬ 
straining  factors  as  early  as  possible  in  a  CTEA  project.  Doing  this  keeps 
the  analyst  from  wasting  time  and  effort  in  the  pursuit  of  impossible  train¬ 
ing  alternatives. 
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As  we  reviewed  the  procedures  for  conducting  the  job  analysis  and 
trainability  determination,  it  was  decided  that  the  best  way  to  carry  out 
the  several  analyses  leading  to  the  evaluation  of  training  alternatives 
was  to  use  as  complete  a  description  of  the  training  requirements  as  possi¬ 
ble.  To  present  these  requirements  we  adopted  a  program  of  instruction  (POI) 
format.  Actually,  more  information  was  to  be  presented  than  is  usual  so 
the  term  "extended  POI"  was  adopted.  The  information  presented  in  the  ex¬ 
tended  POI  was  generated  through  several  sets  of  analytical  procedures  be¬ 
ginning  with  the  specification  of  tasks.  The  sets  of  procedures  were  de¬ 
signed  to  meet  several  process  objectives,  such  as  standardization  of  pro¬ 
cedures,  reduction  of  requirements  for  personnel  time  and  resources,  pro¬ 
vision  of  an  objective  audit  trail,  quantification  of  the  training  alterna¬ 
tives  comparisons  and  establishment  and  management  of  a  reusable  data  base. 

ASA's  CTEA  Process 


I  want  to  stress  that  ASA's  CTEA  process  accomplishes  the  purposes  re¬ 
quired  in  the  LCSMM,  as  does  most  other  approaches.  However,  ASA's  approach 
provides  additional  products  from  the  analytical  procedures  that  result  in 
a  more  integrated  and  resource  efficient  training  development  system. 

The  various  blocks  of  procedures  used  in  our  approach  will  now  be 
briefly  presented.  There  are  ten  major  blocks  of  activities  as  follows: 

1.  Preparation  of  work  plan 

2.  Analysis  of  missions  and  functions 

3.  Selection  of  tasks  for  training 

4.  Analysis  of  tasks 

5.  Generation  of  general  course  structure 

6.  Generation  of  training  program  alternatives 

7.  Development  of  extended  POI 

8.  Analysis  of  training  effectiveness  and  trainability 

9.  Analysis  of  training  costs 

10.  Final  trade-off  analysis 

The  first  seven  sets  of  activities  are  preliminary,  but  absolutely  essen¬ 
tial,  to  the  actual  conduct  of  the  cost  and  effectiveness  analyses. 

The  methodology  represents  an  integration  of  various  elements  of  the 
state-of-the-art  in  instructional  technology,  course  design,  and  collective 
and  individual  front-end  analyses.  The  first  seven  sets  of  activities  are 
described  as  preliminary  ir.  that  they  produced  the  various  assumptions,  con¬ 
straints,  training  development  materials,  decision  data  and  information 
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required  for  the  generation  of  training  effectiveness  estimates  and  the 
determination  of  cost  differences  for  the  specified  training  program  op¬ 
tions.  These  seven  sets  of  activities  are  the  key  to  the  ASA  CTEA  method¬ 
ology.  The  rationale  for  our  emphasis  on  the  FEA  approach  draws  upon  the 
purpose  of  CTEA  and  its  relationship  to  the  materiel  system  development 
process . 

The  specific  nature  of  a  CTEA  often  is  dependent  upon  the  state  of 
development  of  the  materiel  system  under  study.  For  conceptual  materiel 
systems ,  the  lack  of  performance  data  will  mean  that  CTEA  primarily  will 
be  used  to  forecast  training  resource  requirements  and  to  indicate  poten¬ 
tial  problem  areas  in  the  training  program.  In  other  words,  a  CTEA  begun 
early  in  the  materiel  development  process  will  identify  training  issues 
that  may  require  special  examination  during  later  developmental  and  opera¬ 
tional  testing  (DT/OT).  As  prototypes  of  the  materiel  system  become  avail¬ 
able,  CTEA  would  involve  updating  and  validating  cost  and  resource  impact 
projections,  and  analytical  investigations  of  training  effectiveness. 
Following  the  field  deployment  of  the  materiel  system,  the  emphasis  of  CTEA 
would  be  on  the  cost  effectiveness  of  (1)  training  "fixes"  designed  to 
address  training  deficiencies,  or  (2)  training  modifications  designed  to 
meet  an  altered  threat  scenario  or  to  accommodate  evolutionary  hardware 
modifications. 

Preparation  of  Work  Plan 

Once  the  status  of  the  LCSMM  for  the  materiel  system  has  been  deter¬ 
mined,  all  relevant  documentation  for  the  system,  antecedent  and  similar 
systems  are  identified  and  located.  This  is  a  very  important  step  for  the 
CTEA  in  that  the  documentation  provides  a  definitive  basis  from  which  to 
launch  the  study.  As  the  LCSMM  process  proceeds,  additional  documentation 
and  changes  to  the  initial  set  Of  materials  must  be  obtained.  This  mater¬ 
ial  is  assembled,  cataloged  for  future  CTEA  activities,  and  reviewed.  The 
initial  and  subsequent  sets  of  documents  becomes  the  first  element  of  the 
audit  trail.  They  also  provide  the  beginning  list  of  constraints  that  must 
be  dealt  with. 

The  location  of  equipment  and  personnel  familiar  with  the  equipment 
are  next  determined.  This  is  necessary  in  order  to  prepare  the  detailed 
work  plan.  Interviews  with  these  subject  matter  experts  (SMEs)  are  sched¬ 
uled  as  required  in  the  CTEA  process.  The  detailed  plan  generally  follows 
the  CTEA  block  diagram  of  major  activities.  Forty  nine  specific  steps  are 
described  in  the  detailed  plan  to  reflect  the  support  requirements,  time 
schedules,  data  analyses,  and  output  products. 


Mission  and  Function  Analysis 


Concurrently,  work  can  begin  on  the  specification  of  the  initial  job 
task  list.  This  begins  with  a  review  of  the  documentation  to  identify  the 
materiel  system  missions.  Missions  are  analyzed  into  functions,  which  are 
major  chunks  of  mission  activities  usually  assigned  to  specific  job  posi¬ 
tions.  Where  job  descriptions  already  exist  and  MOSs  have  been  designated 
for  the  positions,  the  mission  analysis  is  often  not  necessary. 

ASA  uses  systematic  sets  of  procedures  for  analyzing  categories  of 
functions.  The  functions  are  analyzed  with  the  assistance  of  SMEs  who  pro¬ 
vide  specific  responses  to  questions  which  pertain  to  the  function  being 
analyzed.  SMEs  identify  specific  actions  performed  on  specific  objects. 
These  responses  are  subsequently  written  as  task  statements.  This  analysis 
process  produces  the  initial  set  of  tasks,  which  is  the  second  audit  trail 
item. 

Selection  of  Tasks  for  Training 


Since  training  time  is  limited,  all  job  tasks  usually  cannot  be  included 
in  training.  Using  guidelines  from  the  ISD  process,  ASA  developed  a  set  of 
procedures  that  first  selects  tasks  that  should  be  trained.  Then  it  speci¬ 
fies  the  level  of  training  that  is  required.  This  information  has  two  CTEA 
benefits:  (1)  A  rational,  objective  basis  is  provided  for  selecting  tasks 

for  training;  and  (2)  Information  is  generated  for  designing  the  entire 
training  system  for  the  MOS,  to  include  resident,  unit,  refresher,  mainte¬ 
nance,  and  wartime  preparatory  training  requirements.  The  information  for 
making  these  decisions  comes  from  the  analysis  of  the  criticality  of  each 
task.  Less  critical  tasks  require  only  the  use  of  simple  procedures  or  gen¬ 
eral  skills  and  thus  may  not  need  to  be  trained.  Remaining  tasks  can  be 
ordered  in  terms  of  the  ten  criticality  dimensions  that  are  used: 

1.  Learning  difficulty 

2.  Performance  difficulty 

3.  Time  delay  tolerance 

4.  Consequences  of  Inadequate  performance 

5.  Immediacy  of  performance 

6.  Civilian  acquired  skills 

7.  Task  importance 

8.  Frequency  of  performance 

9.  Wartime  task 
10,  Task  decay  rate 
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Each  item  is  assessed  on  a  three  level  scale  by  SMEs.  The  responses  are 
then  subjected  to  logical  analysis  using  a  computer  routine.  Each  task  is 
classified  into  one  of  the  following  categories: 

1.  Requires  certification  training 

2.  Requires  qualification  training 

3.  Requires  wartime  refresher  training 

4.  Can  be  considered  for  on  the  job  training 

5.  Requires  maintenance  of  proficiency  training 

6.  Can  be  considered  for  reduction  of  training  time 

7.  Can  be  considered  for  elimination  from  training 

Analysis  of  Tasks 

The  tasks  that  are  selected  for  training  are  next  reviewed  by  SMEs  who 
provide  task  content  and  context  information.  Four  sets  of  information  are 
obtained — performance  standards;  unusual  job  conditions;  stimuli  that  must  be 
attended  to  during  task  performance;  and  the  skills  required  for  performing 
the  task.  The  products  of  this  analysis  are  the  performance  objectives  which 
become  the  terminal  learning  objectives,  the  functional  learning  requirements 
and  the  instructional  delivery  systems  that  are  appropriate  for  each  task. 

Generation  of  General  Course  Structure 


This  analysis  requires  SMEs  to  provide  inter-task  dependency  informa¬ 
tion  for  each  task.  SMEs  identify  several  tasks  that  they  must  know  how  to 
perform  before  they  can  carry  out  a  specified  task.  The  entire  list  of  tasks 
selected  for  training  is  reviewed  and  3-5  subordinate  tasks  are  identified 
for  most  of  the  tasks.  There  are  always  several  tasks  at  the  bottom  of  the 
hierarchy  where  only  one  or  two  subordinate  tasks  are  designated,  and  one  or 
two  that  fall  at  the  very  bottom  so  have  no  subordinate  tasks.  The  hierarchy 
information  is  subjected  to  analysis  by  a  computer  routine  which  provides  a 
printout  presenting  the  relationships  between  all  tasks  and  skills. 

ASA's  approach  to  the  structuring  of  training  programs  stems  from  our 
job  structure  philosophy — that  job  tasks  are  accomplished  by  the  application 
of  skills  and  the  carrying  out  of  simple  procedures.  Simple  procedures  can 
be  carried  out  by  merely  following  written  or  verbal  instructions.  Special 
and  general  skills  are  acquired  through  practice  and  thus  must  be  trained. 
General  skills  are  usually  already  in  the  behavior  repertoire  of  the  general 
student  population  and  usually  do  riot  need  to  be  trained.  Special  skills 
thus  are  the  primary  substance  of  training  programs.  Skills  are  applied 
across  work  tasks  so  should  be  taught  in  the  same  manner.  In  ASA's  CTEA, 
the  general  course  structure  is  determined  through  an  iterative  process 
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beginning  with  the  integration  of  the  task  and  skill  hierarchies.  Tasks 
that  require  skills,  which  fall  at  the  bottom  of  the  skill  hierarchy,  are 
placed  early  in  the  course  structure.  As  you  move  through  the  course  struc¬ 
ture,  more  new  skills  are  introduced.  Tasks  placed  later  may  include  both 
skills  that  have  already  been  taught  or  have  not  yet  been  introduced.  The 
"new"  skills  are  then  trained,  and  the  "old"  skills  already  taught,  are 
practiced . 

The  course  structure  is  modified  from  the  ideal  according  to  the  spe¬ 
cific  constraints  that  apply  at  the  proponent  school.  Constraints  are 
issues  that  impact  the  design,  delivery  and  management  of  a  course.  These 
issues  are  identified  throughout  the  CTEA  study  in  interviews  with  school 
personnel,  TRADOC  Systems  Management  (TSM)  personnel,  and  in  the  reviews 
of  antecedent  or  similar  systems  training  materials.  By  the  time  that  the 
general  course  structure  is  generated  almost  all  of  the  constraints  have 
been  identified.  These  include  the  following: 

1.  Available  time  and  resources 

2.  Instructors — number  and  qualifications 

3.  Facilities 

4.  Management  structure 

5.  Student  population 

6.  Student  load 

7.  Course  content 

8.  Instructional  method  philosophy 

9.  Media  and  training  devices  available 

Generation  of  Training  Program  Alternatives 

Once  a  general  course  structure  has  been  developed,  the  next  step  Is 
to  specify  training  program  alternatives  which  could  be  used  to  accomplish 
the  training  objectives.  The  variables  that  can  be  manipulated  include  the 
instructional  delivery  system,  the  training  time  (peacetime,  shortened  peace 
time,  mobilization  time),  logistical  support,  types  of  simulators,  student 
characteristics  and  numbers,  facilities,  and  numbers  of  instructors.  The 
actual  selection  of  training  alternatives  does  not  involve  a  free  choice  of 
alternative  variables  or  levels  of  variables.  School  philosophies,  doctrine 
and  local  SOPs,  in  addition  to  any  training  structure  determined  during  the 
materiel  system  conceptualization,  narrow  the  list  of  possible  alternatives 
to  a  relatively  small  list. 

Two  approaches  can  be  used  to  select  alternatives.  This  first  is  to 
identify  two  extreme  levels  of  a  few  variables  that  would  have  the  greatest 
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impact;  on  cost  and  effectiveness  estimates  and  then  propose  the  two  extreme 
combinations  as  the  alternatives.  The  second  approach  would  be  to  select 
alternatives  that  would  be  most  acceptable  to  the  school  personnel,  under 
their  constrained  conditions,  and  one  or  more  ideal  alternatives  that  essen¬ 
tially  ignore  the  constraints.  The  ideal,  if  it  proved  to  be  the  most  cost/ 
effective,  would  be  an  alternative  that  the  school  can  plan  subsequently  to 
work  towards . 

Development  of  Extended  POI 

The  last  preliminary  analytical  block  of  activities  is  the  development 
of  an  extended  POI.  The  POI  is  the  primary  product  of  the  first  seven  sets 
of  CTEA  activities.  It  is  one  document  that  provides  almost  all  of  the  in¬ 
formation  needed  for  course  development.  It  is  the  course  design.  It  is 
designed  so  it  can  be  used  as  it  is,  intact,  or  taken  apart  and  recombined 
to  meet  individual  course  developer  needs,  bias  or  whatever. 

Each  page  of  the  POI  covers  one  task.  The  performance  objective  is 
written  out.  New  and  old  skills  are  specified,  which  reflects  the  ordered 
structure  of  the  entire  POI.  The  instruction  method  and  delivery  medium  are 
listed.  And  the  instruction-to-practice  time  relationship  is  presented. 

An  extended  POI  is  prepared  for  each  training  program  alternative,  wherever 
necessary.  Quite  often  different  training  program  alternatives  can  use  the 
same  POI,  since  the  dimensions  determining  differences  are  not  all  related 
to  the  instructional  process.  For  example,  where  operational  equipment  is 
to  be  used  that  presents  a  display  to  the  operator,  various  means  can  be 
used  to  generate  the  input  signal.  The  actual  stimulus  source  could  be  used, 
or  any  of  a  number  of  simulators  could  be  used.  As  far  as  training  the 
operator,  however,  it  would  make  absolutely  no  difference  how  the  display 
signal  is  generated,  since  the  student  would  not  see  the  stimulus  source. 

But  tremendous  differences  in  costs  could  exist  between  the  possible  signal 
generators,  each  of  which  would  be  considered  a  different  training  program 
alternative . 

Analysis  of  Training  Effectiveness  and  Trainabxlity 

The  POI  is  used  as  the  primary  stimulus  for  eliciting  estimates  of 
training  effectiveness.  ASA  uses  a  forecasting  approach  for  obtaining  these 
estimates.  Instructional  experts  are  used  as  SMEs  for  estimating  the  effec¬ 
tiveness  of  training  each  task.  The  SMEs  use  the  POI  to  provide  the  set  for 
their  estimates.  The  general  instructions  to  the  SMEs  stress  that  each  ele¬ 
ment  of  the  POI  must  be  reviewed  and  then  kept  in  mind  when  estimating  ef¬ 
fectiveness.  Two  dependent  variables  can  be  used  to  estimate  the  utility 
of  a  training  program — training  time  or  training  effectiveness.  One  of 
these  is  usually  held  constant,  while  the  other  is  allowed  to  vary.  Usually 
time  is  held  constant  and  the  effectiveness  variable  is  manipulated  by  al¬ 
ternating  training  program  variables.  Ir.  the  two  Army  CTEA  projects,  we 
did  just  that,  held  time  constant  and  estimated  effectiveness. 
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Actually  the  process  is  iterative  in  nature.  Several  questions  are 
asked  of  SMEs.  The  POI  is  first  presented  without  training  time.  Instruc¬ 
tional  design  experts  are  asked  to  indicate  how  long  it  would  take  to  train 
the  task  under  normal  conditions.  They  then  indicate  the  minimum  time  for 
training  under  wartime  conditions,  and  also  provide  a  maximum  time  they 
would  use  if  no  time  constraints  are  imposed. 

The  next  step  is  lo  have  SMEs  estimate  the  effectiveness  of  training 
each  task  using  all  the  information  on  the  POI.  In  our  Army  projects,  effec¬ 
tiveness  was  defined  as  the  percentage  of  students  that  would  reach  the  per¬ 
formance  criterion  under  each  time  condition  for  each  training  alternative. 

These  effectiveness  estimates  are  used  to  assess  the  trainability  of 
each  task.  A  task  is  trainable  if  the  required  percentage  of  students  can 
reach  the  performance  criterion  level  within  the  normal  time  frames  to  be 
established  in  the  training  program.  By  obtaining  effectiveness  estimates 
for  each  task,  decisions  can  be  made  subsequently  about  program  modifica¬ 
tions  for  tasks  that  do  not  meet  the  trainability  criterion. 

Task  criticality  dimensions  are  used  to  provide  a  more  quantitative 
basis  for  the  comparison  of  total  training  program  effectiveness.  Tasks 
differ  in  terms  of  their  training  worth.  ASA  defines  training  worth  in 
terms  of  five  of  the  ten  criticality  dimensions  used  to  select  tasks  for 
training.  These  five  dimensions  were  scaled  cn  the  basis  of  the  informa¬ 
tion  utility  they  provide  for  determining  the  worth  of  including  a  task  in 
a  training  program.  These  dimensions  and  their  scale  values  are: 


1. 

Consequences  of  inadequate  performance 

Worth  Value 

.45 

2. 

Task  importance 

.26 

3. 

Time  delay  tolerance 

.16 

4. 

Frequency  of  performance 

.08 

5. 

Immediacy  of  performance 

.05 

Each  of  the  three  levels  of  these  dimensions  are  weighted.  By  multiplying 
the  worth  value  by  the  response  weight  and  by  the  effectiveness  estimate,  a 
figure  of  merit  value  is  determined  for  each  training  program  alternative. 
The  results  of  this  process  are  training  effectiveness  ratings  for  the  pro¬ 
grams,  which  are  then  compared  by  determining  the  relative  effectiveness. 

Analysis  of  Training  Costs 

The  training  cost  analysis  requires  a  rather  detailed  analysis  of  the 
training  support  requirements  for  each  program.  Specific  requirements  that 
are  identified  are  numbers  and  types  of  support  personnel,  the  hours  they 
will  be  required  during  the  entire  course,  equipment  that  must  be  dedicated 
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or  borrowed  and  hours  it  will  be  used,  materials  to  directly  or  indirectly 
support  the  training,  facilities  to  include  classrooms,  offices,  maintenance 
shops,  quarters,  etc.  The  cost  analysis  approach  that  is  used  is  often 
termed  an  incremental  cost  analysis.  Since  only  relative  cost  comparisons 
are  necessary  to  meet  the  CTEA  purposes,  all  common  costing  elements  are 
identified  and  eliminated  from  further  consideration.  Only  those  elements 
or  levels  of  elements  that  are  unique  to  a  program  are  subsequently  costed. 
Two  kinds  of  costs  are  obtained;  exact  costs  for  unique  equipment  and  mater¬ 
ials;  and  average  or  scaled  costs  for  personnel  and  common  equipment  items 
which  are  taken  from  cost  data  bases  which  are  kept  current  by  several  agen¬ 
cies. 


The  total  unique  costs  are  determined  for  training  program  development, 
training  program  implementation  and  training  program  maintenance.  These 
costs  are  then  combined  to  provide  a  figure  for  a  one  year  period.  Next, 
life  cycle  costs  for  the  training  program  are  figured.  These  figures  re¬ 
flect  only  the  cost  of  the  unique  training  support  requirements  for  each  pro¬ 
gram.  There  is  no  intent  here  to  provide  an  absolute  cost  estimate,  since 
the  purpose  of  a  CTEA  is  to  provide  input  to  the  decision  of  which  training 
approach  should  be  used.  The  last  step  is  to  determine  relative  program  costs 
by  forming  ratios  of  cost  figures,  using  one  program  as  the  base  costing  figure. 

Final  Trade-Off  Analysis 

The  final  CTEA  step  is  to  compare  all  program  alternatives  on  both  the 
relative  effectiveness  and  relative  cost  figures.  Since  both  effectiveness 
and  cost  data  are  relative  figures  and  the  cost  ratios  are  based  upon  incre¬ 
mental  cost  figures,  it  can  be  somewhat  misleading  to  combine  the  two  sets 
of  data  into  one  figure  of  merit,  A  total  figure  would  hide  information  that 
would  be  significant  in  selecting  a  program.  Since  both  effectiveness  and 
cost  figures  are  relative,  a  tabular  display  is  more  appropriate  as  seen  here. 

Because  of  the  structure  of  the  POI  and  the  detailed  descriptions  of 
the  training  support  requirements  for  the  different  programs,  modifications 
can  readily  be  made  to  any  program  to  alter  the  effectiveness  and/or  cost 
ratings.  Any  modification  to  the  program  would  require  subsequent  changes 
to  the  POI  and  course  structure,  but  this  requires  relatively  little  in  the 
use  of  further  resource  expenditure  as  compared  to  the  initial  program  de¬ 
sign.  In  other  words,  fine  tuning  of  a  program  to  reduce  costs  or  increase 
effectiveness  is  quite  possible,  bringing  the  realistic  program  towards  an 
ideal  cost/effectiveness  level. 

In  summary,  ASA's  CTEA  heavily  emphasizes  a  detailed  analysis  of  the 
job  and  the  requirements  for  training  that  job  prior  to  the  actual  analysis 
of  the  effectiveness  and  costs  of  alternative  training  programs.  Informa¬ 
tion  and  data  are  generated  through  seven  sets  of  preliminary  activities 
that  produce  information  that  is  of  use  to  other  users  in  the  training  de¬ 
velopment  system. 
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Prediction  of  Boot  Camp  Attrition; 


IRT  vs.  Number-Right  Scoring 


Item  Response  Theory  (IRT)  appears  to  be  the  wave  of  the  near 
future  in  testing.  Although  the  IRT  literature  is  voluminous,  it  seems 
that  there  are,  to  date,  virtually  no  data  on  the  improvement  in 
predictive  validity  from  IRT  scoring  as  compared  to  number-right 
scaring.  The  US  Coast  Guard  use  the  Coast  Guard  Selection  Test  (COST) 
for  enlistment  screening.  The  CGST  is  a  battery  of  three  tests: 
verbal  ability,  arithmetic  ability,  and  mechanical  comprehension.  A 
large  sample  (ca.  2200)  of  Coast  Guard  enlistees  was  tracked  through 
boot  camp  to  obtain  the  criterion  data  on  completion/attrition.  For 
this  sample,  number-right  scoring  and  IRT  scoring  were  applied  to  each 
of  the  three  subtests  of  the  CGST.  Comparisons  were  made  on  various 
indices  of  predictive  validity  between  the  two  types  of  scoring. 
Implications  of  the  findings  for  improving  military  selection  pro¬ 
cedures  were  discussed. 
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PREDICTION  OF  BOOT  CAMP  ATTRITION:  IRT  VS.  NUMBER-RIGHT  SCORING 


Robert  L.  Frey,  Or.  Karen  Jones 

USCG  Headquarters  USCG  Institute 

Item  Response  Theory  (IRT)  appears  to  be  the  wave  of  the  near  future  In  testing. 
Although  the  IRT  literature  is  voluminous,  the  reported  research  comparing  real- 
world  predictive  validity  of  IRT  scoring  vs.  number-right  scoring  seems  to  be 
sparse.  In  the  opinion  of  the  authors,  there  is  now  a  need  for  a  number  of 
such  comparative  studies  in  different  organizations  and  settings.  Only  in 
this  way  will  we  obtain  the  necessary  overview  to  detemine  the  practical  utility 
of  IRT  scoring. 

A  detailed  discussion  of  item  response  theory  is  not  germane  to  the  purpose  of 
this  study.  However,  Weiss  and  Davison  (1981)  present  an  overview  of  testing 
theories  which  includes  a  discussion  of  IRT  research  dealing  with  problems  not 
resolvable  witn  classical  testing  theory.  Suffice  it  to  say  that  the  major 
reason  for  our  keen  interest  in  the  IRT  scoring  of  conventional  tests  is  the 
expectation  that  this  procedure  will  result  In  improved  predictive  validity. 

In  brief,  we  have  this  expectation  because  of  the  superiority  of  IRT  scoring. 

IRT  scoring  provides  estimates  wh f ch  are  sample  Invariant  and  take  into  account 
each  test  item's  characteristic  curve.  Consequently,  IRT  scoring  results  in  a 
much  more  accurate  estimate  of  one's  ability  level  on  the  dimension  in  question 
(e.g.,  verbal,  arithmetic)  than  number-right  scoring.  A  logical  inference,  then, 
is  that  IRT  scores  should  demonstrate  better  predictive  validity  than  number- 
right  scoring  of  the  same  test. 

Specifically,  this  study  compared  number-right  scoring  vs.  IRT  scoring  of  the 
Coast  Guard  enlistment  screening  test  in  the  prediction  of  boot  camp  attrition. 
The  test  battery  used  by  the  Coast  Guard  for  enlistment  screening  Is  called  the 
Coast  Guard  Selection  Test  (CGST).  The  CGST  has  three  subtests.  1)  General 
Classification  Test  (GCT),  a  verbal  performance  test;  2)  Arithmetic  (ARI);  and 
3)  Mechanical  Comprehension  (MECH). 

Method  and  Results 

CGST  subtest  scores  were  obtained  from  a  sample  of  2138  Coast  Guard  recruits. 
Number-right  scoring  is  the  only  method  used  for  operational  purposes.  There 
were  three  number-right  scores  for  each  person:  1)  GCT,  2)  ARI,  and  3)  MECH. 

The  raw  scores  were  converted  to  Navy  Standard  Scores  using  equicentile 
calibration  with  the  Navy  BTB,  Form  6.  Navy  Standard  Scores  have  a  theoretical 
mean  of  50  and  a  standard  deviation  of  10  on  an  unrestricted  population  of 
service-age  youth. 

IRT  Estimates  of  Examinee  Abl 1 i ty 

IRT  estimates  of  examinees'  ability  on  each  of  the  three  tests  (GCT,  ARI,  and 
MECH)  were  obtained  using  estimates  of  item  parameters  and  a  Bayesian  modal 
estimate  scoring  program.  The  latent  trait  or  item  response  theory  model 
selected  to  define  the  item  characteristic  curve  or  the  regression  of  each 
bi nary-scored  multiple  choice  item  on  latent  ability  was  Birnbaum’s  (1968) 
logistic  three  parameter  model.  With  this  model,  the  item  parameters  are: 
item  discrimination,  a-j;item  difficulty,  bi ;  and  item  coefficient  of  guessing  or 
the  lower  asymptote,  cy  (Hambleton  and  Coo*\  1977;  Urry,  1977). 


440 


The  Item  parameters  were  estimated  using  the  computer  program  ANCILLES.  Since 
an  earlier  version  of  this  program  is  discussed  by  t/rry  (1975),  the  discussion 
of  the  program  in  this  paper  will  be  limited  to  what  is  required  to  enable  the 
reader  to  understand  how  the  program  was  used.  The  item  parameters  were 
estimated  for  each  of  the  tests  separately.  That  Is,  the  6CT,  ARI,  and  MECH 
were  analyzed  separately  as  tests  of  50,  35,  and  25  items,  respectively. 

ANCILLES  provides  the  user  with  the  option  of  the  normal  ogive  or  the  logistic 
model.  The  logistic  model  was  selected.  The  item  parameters  are  estimated  by 
an  iterative  minimum  chi-square  procedure  in  two  stages  and  the  user  can  obtain 
item  parameters  from  either  or  both  the  stages.  In  the  first  stage,  the 
distribution  of  ability  is  represented  by  corrected  raw  scores  and  in  the  second 
stage  the  distribution  of  ability  is  represented  by  Bayesian  modal  estimates  of 
ability.  The  item  parameters  used  in  this  investigation  were  from  the 
second  stage.  When  the  item  parameters  were  estimated,  two  items  (one  on  6CT 
and  one  on  ARI)  did  not  fit  the  model  and  item  parameters  were  not  estimated  by 
ANCILLES.  Therefore,  these  items  were  omitted  from  computations  in  the  next 
pha se. 

In  this  phase  of  the  investigation,  estimates  of  examinee  ability  were  obtained 
using  the  item  parameters  provided  by  ANCILLES  and  a  Bayesian  modal  estimate 
scoring  program.  Apparently,  ANCILLES  can  be  modified  to  provide  estimates  of 
examinee  ability  in  addition  to  the  estimates  of  item  parameters.  However, 
rather  than  modifying  the  program,  a  separate  scoring  program  was  used.  Each 
test  was  again  analyzed  separately  and,  using  the  appropriate  set  of  item 
parameters  estimates,  estimates  of  ability  on  the  GCT,  ARI,  and  MECH  for  each 
examinee  were  obtained.! 

After  the  completion  of  recruit  training,  criterion  data  were  obtained  for  each 
person  -  -  that  is  whether  or  not  the  individual  successfully  completed  recruit 
training.  Before  the  predictive  validity  results  are  presented,  a  few  descriptive 
stctistics  on  the  NSS  scores  and  the  IRT  scores  will  be  noted. 


The  NSS  statistics  were: 


GCT  -  5T  =  55.08 

S.D.  =  8.49 

ARI  -  J  =  49.60 

S.D.  =  7.44 

MECH-  X  =  50.37 

S.D.  =  7.82 

The  IRT  statistics  (theta  estimates) 

were: 

GCT  -  J  =  -0.02 

S.D.  =  0.83 

ARI  -  J  =  0.04 

S.D.  =  0.86 

MECH-  X  =  -0.06 

S.D.  =  0.72 

The  NSS  GCT  score  correlated  .9897  with  the  IRT  GCT  score.  The  NSS  ARI  score 
correlated  .9786  with  the  IRT  ARI  score.  The  NSS  MECH  score  correlated  .9735 
with  the  IRT  MECH  score. 

In  order  to  make  interpretation  easier,  the  IRT  scores  were  then  transformed  to 
have  the  same  means  and  standard  deviations  as  the  NSS  scores.  All  subsequent 
findings  for  the  IRT  scores  are  based  on  this  converted  scale.  In  spite  of  the 
extremely  high  correlations  between  the  IRT  and  NSS  scores,  however,  the  variance 
of  the  IRT  scores  at  each  NSS  score  level  was  not  insignificant.  The  average 
standard  deviation  of  the  IRT  GCT  at  each  NSS  level  was  1.16.  The  average 
standard  deviation  of  the  IRT  ARI  scores  at  each  NSS  level  was  1.40.  The 
average  standard  deviation  of  the  IRT  MECH  scores  at  each  NSS  level  was  1.72. 

*The  authors  wish  to  acknowledge  Thomas  A.  Warm  for  his  programming  assistance 
which  enabled  them  to  use  ANCILLES  and  the  Bayesian  modal  estimate  scoring 
program. 


Even  more  noteworthy  are  the  ranges  of  the  IRT  scores  at  each  MSS  level.  The 
range  of  the  IRT  GCT  scores  is  as  large  as  9  points  at  some  of  the  NSS  levels. 

The  range  of  the  IRT  ARI  scores  is  as  large  as  12  points  at  some  of  the  NSS 

levels.  The  range  of  the  IRT  MECH  scores  is  as  large  as  13  points  at  some  of 

the  NSS  levels.  In  other  words,  at  some  NSS  levels,  people  with  the  same  number- 
right  scores  have  IRT  scores  which  differ  by  as  much  as  1.66  standard  deviations 
of  the  entire  sample.  This  result  potentially  allows  for  meaningful  differences 
when  comparing  predictive  validities. 

A  number  of  different  analyses  were  done  to  compare  the  predictive  validity  of 
IRT  scores  vs.  NSS  scores.  In  the  first  analysis,  the  sample  was  stratified 
into  groups  based  on  their  mental  categories.  The  groups  were:  1)  CAT  IV 
(AFQT  score  of  10-30),  2)  CAT  III  B  (AFQT  score  of  31-48),  3)  CAT  III  A  (AFQT 
score  of  49-64),  4)  CAT  II  and  CAT  I  (AFQT  score  of  65-99).  This  was  done  on 

the  basis  of  scores  on  the  GCT+ARI  composite.  (The  GCT+ARI  composite  is 

essentially  a  basic-skills  achievement  test.)  Using  the  IRT  scores,  the  results 
were  as  follows  for  the  total  sample: 


N 

Attrition  Rate 

CAT  IV 

342 

19.0% 

CAT  III  B 

457 

10.1% 

CAT  III  A 

522 

7.3% 

CAT  II  +  I 

817 

4.4% 

Total 

2138 

8.7% 

Using  the  NSS  scores,  the  results  were  as  follows  for  the  total  sample: 


N 

Attrition  Rate 

CAT  IV 

335 

19.4% 

CAT  III  B 

425 

9.1% 

CAT  III  A 

509 

8,5% 

CAT  II  +  I 

842 

4.3% 

rota!' 

2138  ' 

8.7% 

As  can  be  seen  from  the  above  tables,  there  was  practically  no  difference  in 
attrition  rates  using  IRT  scores  vs.  NSS  scores.  The  IRT  CAT  III  B  and  CAT  III  A 
groups  do  seem  to  show  better  di f ferentiation  in  attrition  rater  (10.1%  vs.,  7.3%) 
than  do  the  NSS  groups  (9.1%  vs.  8.5%).  Even  if  this  result  were  to  hold  up  on 
a  new  sample,  there  would  be  minimal  practical  significance,  however. 


In  the  next  analysis,  the  group  was  again  divided  by  mental  categories.  The 
GCT  +  ARI  *  MECH  composite  was  used  instead  of  the  GCT  +  ARI  composite.  The 
results  for  the  IRT  scores  were  as  follows: 


N 

Attrition  1 

CAT  IV 

318 

18.9% 

CAT  III  B 

536 

11.6% 

CAT  III  A 

571 

6.8% 

CAT  II  +  I 

713 

3.4% 

Toul - M  'T71 


The  results  for  the  NSS  scores  were  as  follows: 

N  Attrition  Rate 


CAT  IV 

303 

18.8% 

CAT  III  8 

543 

12.9% 

CAT  III  A 

563 

6.2% 

CAT  II  +  I 

729 

3.8% 

Total 

t ra 

"F.7T 

This  time,  the  results  are  virtually  the  same  for  the  IRT  scores  and  the  NSS 
scores.  In  this  set  of  data,  it  seems  that  IRT  scoring  does  not  provide  better 
predictive  validity  than  number-right  scoring  when  predicting  boot  camp  attrition 
as  a  function  of  mental  categories. 

Next,  discriminant  function  analyses  were  done.  In  brief,  for  the  two  group 
case,  discriminant  function  analysis  is  a  multivariate  procedure  which  finds 
the  one  best  linear  combination  of  variables  to  enable  the  largest  possible 
statistical  separation  between  them.  The  procedure  takes  into  account  the 
vector  of  group  mean  differences  and  the  correlations  amongst  the  variables  in 
determining  the  optimal  weights  for  the  linear  combination.  The  hope  is  that  a 
prediction  equation  will  be  found  to  make  possible  siqnif icantly  more  accuracy 
than  is  obtainable  from  knowledge  of  the  base  rates  alone.  In  this  case,  the  two 
groups  are  graduates  (91.3%)  3nd  attritees  (8.7%)  and  the  variables  are  GCT, 

ARI,  and  MECH.  For  the  IRT  scores,  the  mean  scores  were  as  follows: 


MEAN 

SCORES  FOR 

EACH  GROUP 

GROUP 

GCT 

ARI 

MECH 

GRADUATES 

55.5 

50.0 

50.7 

ATTRITEES 

50.6 

45.8 

46.4 

As  just  mentioned,  the  major  Interest  is  in  the  accuracy  of  prediction  possible 
based  on  the  linear  discriminant  function  derived  from  the  analysis.  Th<?  first 
classification  table  assumes  equal  prior  probabilities  for  each  group: 
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PREDICTED  GROUP  f IRT  SCORES) 


TRUE  GROUP 

GRADUATE 

ATTRITEE 

TOTAL 

GRADUATE 

1240 

713 

1953 

ATTRITEE 

61 

124 

185 

TOTAL 

1301 

83? 

2138 

In  this  case,  1364  observations  (63.81)  were  classified  correctly.  Obviously, 
this  is  quite  undesirable  since  a  blind  "hit"  rate  of  91.3%  Is  possible  simply 
by  classifying  everyone  as  a  graduate. 

The  next  table  assumes  prior  probabilities  proportional  to  sample  size  (91.31 
and  8.7%) 


PREDICTED  GROUP  (IRT  SCORES) 


TRUE  GROUP 

GRADUATE 

ATTRITEE 

TOTAL 

GRADUATE 

1953 

0 

1953 

ATTRITEE 

184 

1 

185 

TOTAL 

2137 

1 

2138 

In  this  case,  1954  cases  were  correctly  classified.  Unfortunately,  the 
improvement  from  the  blind  "hit."  rate  is  miniscule  -  one  additional  observation 
correctly  classified. 

Using  the  NSS  scores,  similar  classification  tables  were  generated.  Assuming 
equal  prior  probabilities  generated  the  following  table: 


PREDICTED  GROUP  (NSS  SCORES) 


TRUE  GROUP 

GRADUATE 

ATTRITEE 

TOTAL 

GRADUATE 

1217 

736 

1953 

ATTRITEE 

62 

123 

185 

TOTAL 

1279 

859 

2138 

In  this  case,  1340  observations  (62.7%)  were  classified  correctly.  Once  again 
this  is  much  worse  than  the  blind  "hit"  rate. 


The  next  table  assumes  prior  probabilities  proportional  to  sample  size 
(91.3%  and  8.7%). 


PREDICTED  GROUP  (NSS  SCORES) 


TRUE  GROUP 

GRADUATE 

ATTRITEE 

TOTAL 

GRADUATE 

1953 

0 

1953 

ATTRITEE 

185 

0 

185 

TOTAL 

2138 

0 

2138 

Unfortunately,  the  results  are  exactly  the  same  as  the  blind  "hit"  rate. 

In  summary,  the  discriminant  function  analyses  showed  little  value  for  improving 
classification  accuracy  for  both  the  IRT  scores  and  the  NSS  scores.  A  major 
factor  is  that  the  proportion  for  the  graduation  group  is  so  high  (91.3%)  that 
it  is  virtually  impossible  for  any  statistical  technique  to  improve  upon  base 
rate  prediction. 

In  the  final  statistical  procedure,  both  graduation/attrition  and  ethnic  group 
(minority/majorl ty)  were  taken  into  account.  In  other  words,  a  crossed  factorial 
design  was  used.  The  two  factors  were  Ethnic  Group  (G)  and  Attrition  (A)  --  a 
2X2  factorial.  Schematically,  the  four  groups  with  the  cell  Ns  were: 


Graduates  Attritees 


Hi nori ty 

i  m  | 

j  .  .  _  j 

Majority 

i  1678  i 

142 

The  attrition  rate  for  minorities  was  13.5%  and  the  attrition  rate  for  the 
majority  was  7.8%.  The  three  dependent  variables  once  again  were  GCT,  ARI ,  and 
MECH.  Specifically,  this  proc>?dure  was  a  non-orthogonal  multivariate  analysis 
of  variance.  It  can  also  be  thought  of  as  a  combination  of  factorial  design  and 
discriminant  function  analysis.  Using  the  IRT  scores,  the  means  for  the  design 
were: 


GCT 

ARI 

MECH 

MINORITY 

GRADUATES 

50.4 

46.3 

45.8 

MINORITY 

ATTRITEES 

46.2 

42.4 

41.  Q 

MAJORITY 

GRADUATES 

56.3 

50.5 

51.5 

MAJORITY 

ATTRITEES 

52.0 

46.8 

47.8 

In  this  design,  the  test  of  the  G  X  A  interaction  is  a  test  for  differential 
validity.  That  is,  within  each  ethnic  group,  are  the  mean  differences  between 
graduates  and  attritees  the  same. 


(Since  the  factorial  design  was  non-orthogonal  ,  the  confounding  effects  of  G 
and  A  were  parti ailed  out  of  the  test  for  G  X  A.) 

The  multivariate  t  for  the  G  X  A  interaction  was  0.018,  p  <  .997.  This  strongly 
indicates  that  there  is  no  evidence  whatsoever  of  differential  validity.  Cf 
course  this  is  virtually  self-evident  from  the  pattern  of  cell  means. 

Next,  the  test  of  the  A  factor  (attrition)  was  done,  partialling  out  the 
confounding  effect  of  the  G  factor  (ethnic  group).  The  multivariate  F  for  the 
A  factor  was  24.02,  p  <  .001,  R  =  .181.  The  standardized  discriminant  function 
coefficients  and  the  discriminant  loadings  (or  structure  coefficients)  for  the 
A  factor  were: 


COEFFFICIENTS  LOADINGS 

GCT  .385  .800 

ARI  .406  .792 

MECH  .485  .764 

The  coefficients  are  the  weights  applied  to  the  standardized  dependent  variables 
to  compute  the  discriminant  scores.  The  loadings  are  the  correlations  of  the 
dependent  variables  with  the  discriminant  scores.  The  coefficients  indicate 
the  relative  contribution  of  each  variable  to  the  statistical  discrimination 
between  the  graduation  and  attrition  groups.  The  loadings  are  used  (as  in 
factor  analysis)  to  determine  the  underlying  dimensionality  of  the  discriminant 
function. 

Since  coefficients  and  loadings  are  subject  to  large  sampling  variability,  any 
conclusions  should  be  tentative,  however.  The  coefficients  indicate  that  the 
MECH  test  probably  has  a  slightly  larger  relative  contribution  to  distinguishing 
the  graduates  and  attritees  than  do  the  GCT  and  ARI.  The  loadings  seem  to 
indicate  that  all  three  dimensions  (verbal,  arithmetic,  mechanical  comprehension) 
are  required  to  define  the  discriminant  function. 

The  test  of  the  G  factor  (ethnic  group)  was  done,  partialling  out  the  confounding 
effect  of  the  A  factor  (attrition).  The  multivariate  F  for  the  G  factor  was 
72.77,  p  <  .001,  R  =  305.  The  coefficients  and  loadings  for  the  G  factor  were: 


COEFFICIENTS 

LOADINGS 

GCT 

.459 

.791 

ARI 

.177 

.650 

MECH 

.616 

.847 

The  coefficients  seem  to  indicate  that  the  MECH  test  has  the  largest  relative 
contribution  to  di sti ngui shi nq  between  the  minority  group  and  the  majority 
qroup,  and  further,  that  the  ARI  has  almost  no  relative  contribution. 

Turning  to  the  NSS  scores,  the  same  set  of  analyses  were  run  -  -  G  X  A,  G,  A. 

The  results  were  virtually  the  same.  The  multivariate  F  for  the  A  factor 
(attrition)  was  20.76,  p  <  .001,  R  =  .16Q.  In  other  words,  the  difference  in  R 
values  between  the  IRT  scores  and  the  NSS  scores  for  attrition  was  .181  -  .169 
=  .012.  needless  to  say,  the  diffference  is  miniscule. 


Di scussion 


To  review  briefly,  a  number  of  comparisons  were  made  between  IRT  scores  and 
number-right  scores  as  predictors  of  boot  camp  attrition.  The  analyses  included 
mental  category  X  attrition  tables,  discriminant  function  analysis,  and  two- 
factor  multivariate  analysis  of  variance. 

There  seems  to  be  no  doubt  that  in  the  case  of  this  sample  of  USC6  recruits, 

IRT  scoring  was  no  better  than  number-right  scoring  for  prediction  of  boot  camp 
attrition.  However,  it  should  be  quickly  noted  that  these  results  definitely 
are  not  an  Indictment  of  IRT  scoring.  First  of  all,  a  variable  such  as  boot 
camp  graduation/attrition  is  a  very  crude  criterion  for  prediction  from  verbal, 
arithmetic,  and  mechanical  comprehension  ability  levels.  Also,  as  mentioned 
previously,  the  attrition  rate  was  only  8.7%.  There  just  isn't  much  room  for 
improving  upon  a  base  rate  prediction  accuracy  of  91.3%  (i.e.,  simply  predict 
everyone  will  graduate). 

Another  major  factor  may  be  the  method  used  to  generate  the  IRT  scores.  It  is 
known  that  Bayesian  modal  scoring  is  biased  toward  the  mean  just  as  number-right 
scoring  is.  Consequently,  the  resultant  IRT  scores  tend  to  have  very  high 
correlations  with  number-right  scores.  Still  another  factor  may  be  the  test 
information  curves.  Unfortunately,  C6ST  test  information  curves  are  not  yet 
available.  However,  some  analyses  done  on  preliminary  versions  of  the  C6ST 
indicate  that  the  test  Information  curves  may  be  flatter  than  Intended.  Tests 
with  relatively  flat  information  curves  also  tend  to  result  in  IRT  scores 
with  high  correlations  with  number-right  scores.  Indeed,  as  noted  earlier  In 
this  study,  the  correlations  between  the  IRT  scores  and  the  number-right  scores 
were  extremely  high  -  -  .9897,  .9786,  and  .9735. 

To  repeat  our  assertion  at  the  beginning,  further  research  of  this  kind  is 
needed  in  many  different  settings.  There  just  simply  is  no  substitute  for  real- 
world  studies  in  which  predictor  scores  and  criterion  scores  are  provided  by 
real  people.  This  is  the  only  way  to  determine  whether  IRT  scoring  will  provide 
the  true  payoff  for  our  organization  -  -  significant  improvement  in  predictive 
validity  as  compared  to  number-right  scoring. 
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s 

*  With  operations  and  maintenance  costs  absorbing  more  than 
80%  of  the  total  life  cycle  cost  of  a  typical  Army  system,  there 
is  increased  emphasis  on  an  effective  soldier/machine  interface. 
The  notion  of  "soldier/machine  system"  is  contrasted  with  that 
of  "hardware  system"  along  with  implications  for  testing.  The 
three  categories  of  data  —  human  engineering  measurements,  user 
acceptance/opinion  data,  and  human  performance  data  —  are  dis¬ 
cussed  with  a  description  of  collection  methods  and  the  role  of 
each  in  a  soldier/machine  system  evaluation.^ 
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For  systems  recently  fielded  by  the  Army  and  for  those 
still  in  development,  it  is  fairly  common  for  the  life  cycle 
cost  projections  for  a  typical  system  to  show  only  about  20%  of 
the  total  cost  for  the  system  to  be  research,  development  and 
acquisition  costs.  The  remaining  80%  is  absorbed  by  opera¬ 
tion,  maintenance  and  other  support  costs.  This  latter  80%  is 
heavily  weighted  with  "people  costs"  to  include  trainers,  oper¬ 
ators,  maintainers  and  others.  Thus  it  is  becoming  increasingly 
apparent  to  Army  planners  that  greater  emphasis  on  developing 
systems  which  have  an  effective  and  efficient  soldier/machine 
interface  —  even  at  the  expense  of  increasing  the  up-front 
acquisition  costs  —  promises  soldier/machine  systems  whose 
total  life  cycle  cost  effectiveness  is  dramatically  enhanced. 

Soldier/Machine  Systems  vs  Hardware 

This  increased  emphasis  on  the  soldier/machine  interface 
has  not  been  a  sudden  change.  Instead  it  has  been  a  gradual 
one  coincident  with  and  related  to  an  increased  willingness  to 
think  in  terms  of  developing  soldier/machine  systems  rather 
than  developing  separate  hardware,  training,  software,  logistic 
support,  technical  documentation  and  facilities  as  separate 
efforts  to  be  later  combined  into  an  effective  system.  It 
should  be  added  at  this  point  that  the  metamorphosis  is  by  no 
means  complete.  There  are  many  members  of  the  developmental 
community  who  still  say  "system"  when  what  they  really  mean  is 
"hardware" . 

At  first  encounter  the  difference  between  the  two  concepts 
may  seem  somewhat  superficial,  but  it  has  fundamental  implica¬ 
tions  for  the  way  in  which  the  Army  develops  new  combat  systems. 
It  also  has  some  profound  consequences  for  testing  of  develop¬ 
mental  systems. 

During  development,  the  systems  approach  requires  that  many 
players  get  into  the  process  right  at  the  beginning.  The  mat¬ 
eriel  developer,  the  combat  developer,  the  trainer,  the  logis¬ 
tician  and  several  others  must  participate  in  defining  the  goals, 
requirements  and  limitations  of  the  system.  Unlike  the  somewhat 


fragmented  approach  to  development  mentioned  above,  the  systems 
approach  requires  that  each  player  participate  throughout  the 
process  in  a  kind  of  trade-off  process.  In  this  process  an  at¬ 
tempt  is  made  to  arrive  at  a  cost-effective  means  of  acquiring 
a  new  combat  capability.  Among  the  trade-offs  which  might,  for 
example,  be  negotiated  is  a  choice  between  (1)  a  hardware  design 
which  is  high  in  acquisition  costs  but  imposes  human  performance 
and  skill  requirements  which  are  few  and  cheap  to  acquire  and 
maintain  versus  (2)  a  hardware  design  which  is  lower  in  acquisi¬ 
tion  costs,  but  requires  human  performance  and  skills  which  may 
be  very  costly  to  acquire  and  to  maintain.  Based  upon  the  avail¬ 
able  cost  predictions,  the  alternative  which  offered  the  lowest 
life  cycle  cost  could  be  selected.  With  only  this  very  brief 
and  general  treatment  of  the  notion  of  soldier/machine  systems 
in  development,  we  will  proceed  to  a  discussion  of  the  conse¬ 
quences  of  this  approach  for  testing. 

Implications  for  Testing 

Once  the  focus  of  attention  shifts  from  "materiel"  to 
"system",  the  business  of  testing  and  evaluation  becomes  quite 
different  from  straight  materiel  testing.  In  component  level 
and  even  subsystem  level  materiel  testing,  hardware  functions 
are  exercised  in  a  way  in  which  any  human  function  is  assumed 
to  have  a  probability  of  1.0  of  being  correctly  performed  the 
first  time  and  every  time,  regardless  of  the  conditions  sur¬ 
rounding  the  performance.  The  concern  in  this  type  of  testing 
is  nothing  more  than,  "Did  the  hardware  behave  as  predicted?" 

The  assumption  is  explicit  that  failures  due  to  human  error 
are  not  chargeable  to  the  hardware  design.  An  implicit  (and 
quite  questionable)  assumption  of  this  approach  to  development 
and  testing  is  that  the  combination  of  manpower  and  training 
resources  available  to  the  Army  when  this  hardware  is  fielded 
will  be  capable  of  meeting  whatever  human  performance  require¬ 
ments  have  been  built  into  the  system. 

Testing  and  evaluation  conducted  under  a  systems  approach 
acknowledges  the  influence  of  the  human  operator  and  his  or  her 
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performance  on  total  system  effectiveness  and  reliability. 
Soldier-in-the-loop  testing  attempts  to  exercise  the  system 
using  a  sample  of  soldiers  who  by  aptitude,  training,  experi¬ 
ence,  and  physical  characteristics  are  typical  of  those  who 
will  function  as  part  of  the  system  once  it  is  fielded.  An 
attempt  is  made  to  control  the  variation  in  this  soldier  sample 
by  use  of  selection  criteria  and  by  use  of  demographic  data  (6) . 
Within  available  test  resources,  system  functions  are  exercised 
over  a  representative  sample  of  the  conditions  (terrain,  weather, 
visibility,  etc.)  which  are  anticipated  in  use  of  the  system  in 
training  and  in  combat. 

In  the  interest  of  efficient  use  of  test  resources,  human 
factors  testing  is  frequently  done  concurrently  with  testing  for 
other  aspects  of  system  performance.  For  example,  in  testing  a 
vehicular  system,  data  on  critical  driving  tasks  might  be  col¬ 
lected  on  the  same  exercises  used  to  test  reliability  or  dura¬ 
bility  of  the  vehicle. 

Human  Factors  Test  Data 

Whether  human  factors  data  are  collected  in  a  separate  test, 
or  during  testing  of  some  other  aspect  of  performance,  there 
are  basically  three  categories  of  data  which  will  be  collected. 
These  are  (1)  human  engineering  measurements,  (2)  user  accept¬ 
ance/opinion  data,  and  (3)  human  performance  data.  Each  makes 
its  own  unique  contribution  to  the  evaluation  and  will  be  treated 
separately. 

Human  Engineering  Measurements.  When  the  Project  Manage¬ 
ment  Office  or  the  prime  contractor  on  a  given  system  begins  to 
plan  for  human  factors  testing,  human  engineering  measurements 
are  usually  considered  first.  During  the  trade-offs  mentioned 
earlier  in  this  paper  as  taking  place  during  establishment  of 
the  goals,  requirements  and  limitations  associated  with  a  system, 
applicable  sections  of  military  standards  (2,5)  and  specifica¬ 
tions  will  have  been  cited  as  design  standards  to  be  met.  The 
requirements  document  may  also  specify  military  handbooks  (4) 
and  other  sources  of  guidance  to  be  used  in  hardware  design. 
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In  general,  human  engineering  measurements  are  the  data  used 
to  evaluate  compliance  with  these  engineering  requirements. 

The  required  data  may  often  be  collected  without  participation 
of  the  human  component  of  tho  system  and  may  address: 
size 
weight 

lighting  level 

noise  level 

crew  workspace  layout 

ingress  and  egress  provisions 

temperature 

whole  body  vibration 

brightness,  ligibility  and  labelling  of  displays 
placement  and  force  requirements  of  controls 
These  are  some  of  the  physical  measurements  which  might  be  made 
to  permit  evaluation  against  design  standards  and  contract  spec¬ 
ifications. 

User  Acceptance/Opinion  Data.  A  second  category  of  test 
data  collected  for  a  human  factors  evaluation  is  used  to  learn 
from  test  participants,  test  control  personnel,  and  observers, 
characteristics  of  the  system  which  would  not  be  revealed  by 
human  engineering  measurements.  While  it  is  generally  accepted 
that  troops  will  not  function  effectively  with  equipment  they 
dislike  or  mistrust,  a  more  important  purpose  for  attempting  to 
tap  the  user's  experience  is  that  the  user  can  sometimes  suggest 
improvements  in  hardware  design  or  in  operating  procedures  which 
would  never  be  revealed  by  checking  for  compliance  with  specifi¬ 
cations  or  standards.  After  an  attempt  to  function  with  the 
hardware  and  procedures  as  a  total  system,  the  test  participants' 
insights  can  be  very  predictive  of  the  level  of  acceptance  to  be 
expected  from  users  once  the  system  is  fielded.  Their  insights 
may  also  identify  protential  problem  areas  which  merit  close 
scrutiny  during  1  ’Tian  performance  testing. 

Generally,  u^  r  opinion  data  are  collected  by  means  of 
questionnaires,  structured  interviews  and  voluntary  reports. 
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Questionnaires  may  be  administered  periodically  throughout  the 
test,  with  an  interview  conducted  perhaps  at  or  near  the  end 
of  testing.  Volunteered  reports  are  recorded  as  they  occur. 

Human  Performance  Data.  The  overall  goal  of  a  human  fact¬ 
ors  engineering  program  is  to  ensure  the  compatibility  of  (1) 
the  soldier,  (2)  the  training,  (3)  the  tasks,  and  (4)  the  equip¬ 
ment.  Having  compared  the  hardware  characteristics  against 
human  engineering  standards  and  trained  a  sample  of  soldiers 
and  elicited  opinions  of  the  equipment  and  training,  the  remain¬ 
ing  task  in  determining  whether  the  human  factors  engineering 
program  has  met  its  goal  is  exercising  mission-critical  soldier 
tasks  and  collecting  and  analyzing  performance  data  on  those 
tasks . 

Collection  of  this  category  of  data  usually  begins  with  a 
review  of  the  human  performance  requirements  associated  with  the 
system.  If  the  system  contract  follows  MIL-H-46855B  and  has 
specified  contract  data  item  DI-H-7055,  Critical  Task  Analysis 
Report  (3,1)  as  a  deliverable  data  item,  this  data  item  is  an 
ideal  place  to  begin  identifying  tasks  to  exercise  in  collecting 
human  performance  data.  Other  input  to  the  task  selection  proc¬ 
ess  would  include  training  materials  used  in  training  test  par¬ 
ticipants,  and  draft  technical  and  training  manuals.  The  list 
cf  tasks  to  be  excercised  should  have  as  its  highest  priority 
those  tasks  whose  performance  represents  an  outer  limit  on  total 
system  performance.  An  example  of  such  a  task  might  be  laying 
the  gun  in  a  tank  system. 

There  are  two  basic  measures  used  in  human  performance 
testing.  One  is  performance  time;  the  other  is  error  rate. 

For  each  task  exercised  and  measured  in  the  test,  both  Kinds  of 
data  should  be  collected  on  each  event.  The  reason  for  insist¬ 
ing  that  both  measures  be  made  of  each  event  is  that  for  most 
tasks,  performance  time  and  error  rate  can  be  traded  off  one 
for  the  other.  The  set  established  in  the  test  participant 
can  radically  affect  whether  he  or  she  emphasizes  speed  of  per¬ 
formance  at  the  expense  of  accuracy  or  conversely,  accuracy  at 


the  expense  of  speed.  For  some  tasks,  the  trade-off  function 
itself  might  be  critically  important  in  affecting  operational 
doctrine  for  the  system. 

Analysis  of  these  data  should  first  compare  achieved  per¬ 
formance  against  the  performance  goals  established  in  the  system 
requirements  documents.  If  the  trade-off  process  mentioned 
earlier  in  establishing  system  goals,  requirements,  and  limita¬ 
tions  has  included  a  thorough  treatment  of  human  per .ormance 
requirements,  human  performance  time  and  error  rate  criteria  in 
the  requirements  document  will  serve  as  test  criteria  for  per¬ 
formance  of  mission  critical  tasks.  Where  no  such  criteria  are 
established,  the  performance  data  will  be  used  in  predicting 
system  performance.  The  question  of  "How  good  is  good  enough?" 
then  gets  a  po&£  hoc  answer,  but  the  answer  may  be  a  more  reason¬ 
able  one  with  knowledge  of  currently  achieved  performance. 

Another  use  —  perhaps  a  more  important  one  —  for  these 
data  is  in  identifying  areas  in  which  human  factors  engineering 
design  improvements  have  a  high  payoff  potential.  If,  for  exam¬ 
ple,  the  data  show  an  unexpectedly  long  performance  time  for 
one  of  a  series  of  sequentially  performed  tasks,  that  task  would 
be  identified  as  a  priority  candidate  for  improvement  in  the  hard¬ 
ware  associated  with  the  task  or  the  procedures  for  performing 
it.  Consideration  - ight  also  be  given  to  automating  part  or 
all  of  its  performance. 

Soldier/Machine  System  Evaluation 

The  botton  line  for  evaluation  of  a  soldier/machine  system 
has  been  reached  when  the  evaluator  has  answered  the  question, 

"So  what?",  for  each  test  issue.  The  specifications  and  stand¬ 
ards  against  which  we  compare  engineering  measurements  should 
be  met.  Their  criteria  have  been  developed  from  experience  with 
many  systems,  and  meeting  those  criteria  improves  the  probability 
of  acquiring  an  effective  and  efficient  soldier/machine  system. 
However,  meeting  those  requirements  does  not  insure  that  this 
has  been  achieved,  nor  does  failure  to  meet  one  or  more  of  the 
criteria  insure  an  unsuccessful  system. 
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Likewise,  the  user's  evaluation  is  important.  The  Army 
has  demonstrated  too  many  times  that  a  system  whose  design  the 
soldiers  dislike  or  distrust  has  little  chance  of  operational 
success.  Unfortunately  (or  maybe  it  isn’t  so  unfortunate,  ex¬ 
cept  for  testing) ,  there  is  not  a  one  for  one  correspondence 
between  equipment  characteristics  about  which  users  complain 
and  equipment  characteristics  which  can  be  shown  to  degrade 
total  system  performance.  In  competitive  testing  it  has  also 
been  observed  that  soldiers  often  state  a  preference  for  an 
equipment  design  with  which  their  performance  is  worse  than 
with  a  less  liked  competitor. 

The  indications  from  engineering  measurements  and  from 
user  input  are  important  in  a  system  evaluation.  They  should 
influence  the  selection  of  tasks  for  human  performance  testing. 
Obviously,  if  an  equipment  characteristic  clearly  violates  a 
human  engineering  standard  or  users  identify  a  design  character¬ 
istic  that  they  feel  significantly  degrades  performance,  these 
characteristics  merit  closer  examination.  The  most  direct  an¬ 
swer  to  the  botton  line  question,  "So  what?",  comes  from  human 
performance  testing.  It  is  from  these  data  that  the  evaluator 
can  learn  whether  the  identified  system  characteristics  signif¬ 
icantly  affect  total  system  effectiveness. 
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Earlier  exploratory  research  Indicated  that  officers  with  different 
academic  backgrounds  (i.e.,  college  majors)  performed  differently  on 
certain  measures  of  duty  performance.  The  purpose  of  this  research  was 
to  extend  the  scope  of  this  earlier  research  by  evaluating  the  effects 
of  academic  preparation  on  the  relevant  performance  measures  when  the 
differences  in  aptitude  are  controlled.  A  sample  of  officers  were 
divided  into  five  groups  on  the  basis  of  the  major  field  of  study 
pursued  as  an  undergraduate.  Comparison  was  made  among  these  five 
groups  on  several  performance  measures  while  using  certain  measures  of 
aptitude  as  covariates.  Discussion  of  the  findings  and  implications  of 
these  findings  will  be  presented* 
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Knowledge  of  the  contribution  that  acadenic  preparation  makes  to 
the  performance  of  Aray  officers  in  actual  duty  performance  could  lead  to 
assignment  strategies  that  would  enhance  officer  utilization  and  career 
progression.  Earlier  research  (Gilbert,  Waldkoetter  &  Castelnovo,  1978) 
explored  the  differences  among  officers  in  the  Field  Artillery  brsnch  who 
pursued  different  undergraduate  college  majors  in  Officer  Basic  Course  and 
on  the  average  Officer  Efficiency  Report  (OER)  ratings  during  the  first 
year  of  active  duty.  The  results  of  that  research  did  not  indicate  any 
statistically  significant  differences  among  the  different  college  major 
groups  on  the  criterion  variables.  Subsequent  research  (Gilbert,  1980) 
yielded  results  that  indicated  differences  among  officers  with  different 
academic  branches  for  a  sample  of  officers  from  all  of  the  13  Career 
Branches  on  certain  aptitude  and  duty  performance  measures.  Within  the 
different  types  of  branches  (i.e.,  Combat  Artas,  Combat  Support  and  Combat 
Service  Support  branches)  differences  on  certain  measures  of  aptitude  and 
duty  peformance  were  also  obtained.  In  both  of  there  Investigations,  the 
analysis  of  variance  technique  was  employed  and  consequently  the  effect  of 
aptitude  measures  on  duty  performance  was  not  controlled. 

This  research  extends  the  scope  of  these  earlier  research  efforts. 
This  Investigation  was  designed  to  evaluate  the  differences  in  duty  perfor¬ 
mance  measures  while  taking  into  acount  the  effect  of  certain  aptitude 
measures  predictive  of  performance  on  these  measures.  These  differences 
were  evaluated  (1)  within  the  total  sample  of  officers  in  all  of  the  13 
career  branches,  disregarding  the  type  of  career  branch  to  which  the 
officers  were  assigned,  ami  (2)  within  groupings  of  the  career  branches 
(i.e.,  Combat  Arms,  Combat  Support,  and  Combat  Service  Support). 


METHOD 

A  sample  of  officers  who  completed  Officer  Basic  Course  (OBC)  and 
who  entered  on  a  tour  of  active  duty  after  completion  of  that  course  were 
used  as  subjects  for  this  research.  Officers  were  divided  into  five  groups 
on  the  basis  of  the  curriculum  that  they  pursued  as  undergraduates.  These 
college  major  groups  were  Humanities,  Business,  Engineering,  Physical 
Science,  and  Social  Science. 
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^he  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
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The  criterion  measures  used  consisted  of  three  types-  The  first 
set  of  criterion  measures  consisted  of  measures  obtained  in  Officer  Basic 
Courses  (OBC);  these  measures  were  the  final  course  grades  obtained  for  the 
course  and  the  peer  ratings  obtained  at  the  end  of  the  course-  The  second 
set  of  criterion  measures  were  the  ratings  obtained  on  a  specially  con¬ 
structed  Performance  Evaluation  Form  (Gilbert  &  Grafton,  1976)  which  was 
based  on  research  reported  by  Helme,  Willemin,  and  Grafton  (1971),  Fleish¬ 
man  (1974),  Stogdill  (1974)  and  Willemin  (1965)-  This  instrument  yielded 
a  global  measure  of  duty  performance  and  measures  along  nine  separate 
dimensions  of  Army  officer  performance.  Ratings  on  each  of  these  dimen¬ 
sions  were  obtained  after  12  to  18  months  of  duty  performance  from  each 
officer's  immediate  supervisor,  from  a  senior  officer  who  had  knowledge  of 
the  officer's  performance  (but  not  necessarily  the  endorsing  officer  for 
Officer  Efficiency  Report  purposes)  and  from  two  close  associates.  These 
four  ratings  were  averaged  for  each  of  the  10  measures  obtained  from  the 
instrument. 

The  third  set  of  measures  consisted  of  the  annual  average  Officer 
Efficiency  Report  (OER)  ratings  for  each  of  the  first  three  years  of  active 
duty  and  the  average  Officer  Efficiency  report  rating  across  all  three 
years. 

The  aptitude  measures  used  as  control  variables  were  obtained  from 
the  Officer  Evaluation  Battery  (OEB).  In  an  earlier  research  effort  the 
utility  of  the  Officer  Evaluation  Battery  is  presented  (Gilbert,  1977). 
The  three  aptitude  scales  of  this  instrument  are  the  Combat  Leadership 
(Cognitive),  Technical  Managerial  (Cognitive),  and  Career  Potential  (Cogni¬ 
tive)  scales.  The  multiple  correlations  between  these  three  measures  and 
each  of  the  performance  measures  are  shown  in  Table  1. 

Two  sets  of  separate  but  parallel  analyses  were  performed  for  each 
of  the  performance  measures.  One  set  of  analyses  consisted  of  an  analysis 
of  variance  for  each  measure.  The  other  set  of  analyses  consisted  of  an 
analysis  of  covariance  for  each  measure  using  the  three  Officer  Evaluation 
Battery  aptitude  measures  as  predictors.  Only  those  officers  for  whom  all 
three  covariates  and  the  relative  performance  measure  were  available  were 
used  as  subjects  in  these  analyses.  These  two  sets  of  analyses  were 
performed  for  the  total  sample  disregarding  career  branch.  Officers  were 
then  divided  as  basis  of  membership  in  the  three  pes  of  career  branches: 
Combat  Arms,  Combat  Support,  and  Combat  Service  -  pport.  The  performance 
of  the  five  groups  of  majors  within  each  type  career  branch  was  then 
compared  using  the  analysis  of  variance  and  i  analysis  of  covariance 
techniques . 


RESULTS  AND  DISCUSSION 

In  Table  2  are  shown  the  means  of  the  five  college  major  groups  for 
the  total  sample  on  all  performance  measures  and  the  corresponding  F- 
ratios  indicating  the  difference  among  these  group  means  by  analysis  of 
variance.  Also,  shown  in  Table  2  are  the  adjusted  group  means  for  the 
different  performance  measures  resulting  from  the  analysis  of  covariance 
using  the  three  aptitude  measures  as  covariates  and  the  cor respond mg' 
F-ratios  for  these  analyses. 
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Table  1 


Multiple  Correlations  between  The  Three  Aptitude 
Measures  and  Each  Performance  Measure 


Performance  Measure 

Multiple  Correlation 

Officer  Basic  Course  Grade 

.40** 

Peer  Ratings 

.15** 

Duty  Performance 

.12** 

Combat  Leadership 

.22** 

Technical  Managerial  Leadership 

.09** 

Tactical  Knowledge 

.23** 

Understanding  Mission 

.10** 

Making  Decisions 

.12** 

Defining  Subordinate  Roles 

.06 

Planning  and  Organizing 

.07* 

Motivating  Troops 

.11** 

Logistical  Knowledge 

.13** 

Annual  OER  Sating  -  1974 

.06* 

Annual  OER  Rating  -  1975 

.0  7* 

Annual  OER  Rating  -  1976 

.05 

Weighted  Average  OER 

.06* 

*Signif leant  at  the  .05  level. 
**Significant  at  the  .01  level. 
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Differences  were  obtained  among  the  five  groups  of  college  majors 
for  the  Officer  Basic  Course  final  grades  using  both  the  analysis  of 
variance  and  the  analysis  of  covariance  techniques.  In  both  analyses  the 
differences  were  significant  at  the  .01  level  and  the  Engineering  majors 
were  favored.  Significant  differences  were  obtained  among  the  means  of  the 
five  groups  for  the  peer  ratings  received  in  Officer  Basic  Course  using  the 
analysis  of  variance  technique;  these  differences  were  significant  at  the 
.01  level  and  the  Engineering  majors  were  favored  on  this  variable. 
However,  the  results  of  the  analysis  of  covariance  using  this  measure  as 
the  criterion  failed  to  indicate  any  differences  among  the  group  means. 

Results  of  the  analysis  of  variance  and  the  analysis  of  covariance 
yielded  significant  differences  among  the  group  means  for  five  of  the 
dimensions  of  duty  performance  as  measured  by  the  Performance  Evaluation 
Form  at  the  .01  level.  These  differences  were  obtained  for  Combat  Leader¬ 
ship,  Technical  Managerial  Leadership,  Tactical  Knowledge,  Making  Deci¬ 
sions,  and  Logistical  Knowledge.  Physical  Science  majors  were  favored  in 
the  Combat  Leadership,  Tactical  Knowledge,  and  Making  Decisions  dimensions. 
However,  business  majors  were  favored  on  Technical  Managerial  and  Logis¬ 
tical  Knowledge  ratings.  Use  of  the  analysis  of  variance  technique  re¬ 
sulted  in  a  difference  among  the  group  means  on  the  Understanding  of 
Mission  dimension  at  the  .05  level  on  which  Business  majors  were  favored 
but  analysis  of  covariance  failed  to  show  a  difference  among  the  five  group 
means  on  this  dimension. 

On  Officer  Efficiency  Report  ratings  differences  were  obtained  among 
the  five  groups  for  each  year  average  using  both  the  analysis  of  variance 
and  analysis  of  covariance  techniques.  The  differences  among  group  meanB 
were  significant  at  the  .01  level  for  the  1974  and  1976  annual  average 
ratings  and  the  .05  for  the  1975  annual  average  rating.  Differences  among 
the  group  means  for  the  three-year  Officer  Efficiency  Report  annual  average 
at  the  .05  level.  The  mean  of  the  Physical  Science  majors  was  highest  on 
all  four  Officer  Efficiency  Report  indices. 

The  means  and  the  adjusted  means  resulting  from  the  analysis  of 
covariance  for  the  five  college  major  groups  in  the  Combat  Arms  branches 
are  shown  in  Table  3.  The  analysis  of  variance  did  not  reveal  significance 
among  the  means  of  the  groups  for  Officer  Basic  Ccurse  final  grades  but  the 
results  of  the  analysis  of  covariance  yielded  significance  at  the  .01 
level.  The  mean  of  Engineering  majors  was  slightly  higher  than  that  of 
Business  majors  and  these  two  groups  were  favored  over  the  others  on  this 
variable.  The  use  of  either  analytic  approach  failed  to  reveal  differences 
for  the  five  college  major  groups  in  the  Combat  Arms  for  any  of  the  dimen¬ 
sions  of  the  Performance  Evaluation  Form  with  the  exception  of  Logistical 
Knowledge.  On  this  variable  the  analysis  of  variance  approach  failed 
to  yield  significance  but  the  analysis  of  covariance  yielded  significant 
differences  among  the  group  means  at  the  .05  level;  the  mean  performance  of 
Business  majors  was  favored.  On  Officer  Efficiency  Report  measures  similar 
differences  were  obtained  among  the  five  groups  using  the  analysis  of 
variance  and  the  analysis  of  covariance  approaches  in  the  Combat  Arms 
branches.  For  the  1974  and  1975  Annual  Officer  Efficiency  Report  averages 
and  for  the  three  year  average  differences  were  obtained  at  the  .01  level 
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while  for  the  1976  annual  average  the  differences  among  group  means  was 
significant  only  at  the  .05  level.  The  means  of  the  Physical  Science 
majors  was  favored  on  all  of  these  indices. 

The  results  of  the  analysis  of  variance  and  for  the  analysis  of  covari¬ 
ance  is  shown  in  Table  4  for  the  five  college  groups  in  the  Combat  Support 
branches.  Results  of  both  the  analysis  of  variance  and  of  analysis  of 
covariance  yielded  signflcant  differences  on  Officer  Basic  Course  grades  at 
the  .01  level,  and  in  both  analyses,  the  mean  of  engineering  majors  was 
favored.  On  the  Combat  Leadership  and  Tactical  Knowledge  dimensions  of  the 
Performance  Evaluation  Form,  differences  were  obtained  among  the  means  of 
the  five  groups  at  the  .01  level,  and  in  both  Instances,  physical  science 
majors  were  favored.  Significance  was  also  obtained  at  the  .01  level  in 
the  analysis  of  variance  on  the  Technical -Managerial  Leadership  dimension, 
and  at  the  .05  level  in  the  analysis  of  covariance  on  the  Decision-Making 
dimension.  Engineering  majors  were  favored  on  the  Technical -Managerial 
Leadership  dimension,  and  physical  science  majors  were  favored  on  the 
Decision-Making  dimension.  Differences  were  not  obtained  among  the  means 
of  the  five  college  majors  on  the  four  Officer  Efficiency  Report  indices. 

The  results  of  the  analyses  for  the  five  college  major  groups  are 
shown  In  Table  5  for  the  officers  in  the  Combat  Service  Support  branches. 
The  difference  among  the  group  means  on  Officer  Basic  Course  final  grades 
was  significant  at  the  .01  level  using  the  analysis  of  variance  approach; 
the  mean  of  the  Engineering  group  was  highest.  However,  the  results  of  the 
analysis  of  covariance  indicated  differences  significant  at  the  .05  level 
and  the  adjusted  mean  of  thie  Business  majors  was  highest.  Results  of  the 
analysis  of  variance  indicated  significance  among  the  groups  for  peer 
ratings  at  the  .05  level  on  which  Physical  Science  majors  were  favored; 
however,  the  results  of  the  analysis  of  covariance  failed  to  indicate 
differences  among  the  groups  on  that  measure.  Analysis  of  variance  results 
yielded  differences  at  .05  level  for  three  of  the  dimensions  of  the  Perfor¬ 
mance  Evaluation  Form.  These  differences  were  obtained  for  the  Combat 
Leadership,  Tactical  Knowledge,  and  Logistical  Knowledge  dimensions  and 
Physical  Science  majors  were  favored  on  all  three  dimensions.  The  results 
of  the  analysis  of  covariance,  however,  failed  to  Indicate  any  difference 
among  the  five  groups  in  the  Combat  Service  Support  branches  on  those 
measures.  Neither  the  results  of  the  analysis  of  variance  or  of  the 
analysis  of  covariance  indicated  any  differences  among  the  means  of  the 
five  groups  on  the  Officer  Efficiency  Report  annual  ratings  or  for  the 
three-year  average  of  Officer  Efficiency  Report  ratings. 

In  summary,  Engineering  majors  received  higher  Officer  Basic  Course 
grades  in  the  total  sample,  in  the  Combat  Arms  branches  and  in  the  Combat 
Support  branches  while  Business  majors  received  higher  Officer  Basic  course 
grades  in  the  Combat  Service  Support  branches.  This  may  be  due  to  the 
relevance  of  these  college  curricula  to  the  curricula  of  the  Officer  Basic 
Courses.  Business  majors  were  favored  on  the  Technical  Managerial  ratings 
in  the  total  sample  and  in  the  logistical  knowledge  ratings  in  the  total 
sample  and  in  the  Combat  Arms  branches.  These  results  are  to  be  expected 
since  both  dimensions  are  characteristic  of  staff -managerial  functions. 
However,  Physical  Science  majors  were  favored  on  logistical  knowledge 
ratings  in  the  Combat  Service  Supprt  branches.  Physical  Science  majors 
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received  higher  tactical  knowledge  ratings  in  the  total  sample,  in  the 
Coabat  Support  and  in  the  Coabat  Service  Support  branches  and  were  rated 
higher  on  asking  decisions  in  the  total  sample  and  in  the  Coabat  Support 
branches;  they  also  had  a  higher  group  aean  on  the  Officer  Efficiency 
Report  (OER)  indices  in  the  total  saaple  and  in  the  Coabat  Arms  branches. 

The  results  of  this  research  tend  to  indicate  that  certain  college 
curricula  nay  be  more  coapa table  than  others  with  officer  duty  performance 
within  the  different  types  of  Army  career  branches*  The  fact  that  these 
findings  are  not  clear-cut  may  be  due  to  the  fact  that  the  grouping  of 
branches  used  did  not  provide  for  a  homogeneous  set  of  performance  require¬ 
ments  and  that  the  grouping  of  college  majors  were  not  sufficiently  re¬ 
fined.  Future  research  in  this  area  will  take  those  considerations  into 
account. 
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Army  Officer  Success  In  the  Different  Types  of  Career  Branches 

'''"N 

The  purpose  of  this  research  was  to  evaluate  the  utility  of  dif¬ 
ferent  measures  of  potential  in  identifying  groups  of  Amy  officers  who 
were  defined  as  being  successful  in  different  types  of  career  branches. 
Several  measures  of  aptitude  and  motivation  were  used  in  the  analyses. 
Implications  of  the  findings  will  be  presented.  ^ 
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In  previous  research,  the  characteristics  of  those  officers  who 
received  higher  final  course  grades  in  Officer  Basic  Courses  (OBC)  than 
would  have  been  predicted  on  the  basis  of  aptitude  measures  were  explored 
(Gilbert,  1980).  The  results  of  that  research  indicated  that  those  offi¬ 
cers  displayed  a  greater  interest  in  becoming  an  Amy  officer  and  a  greater 
interest  in  those  activities  related  to  success  as  a  combat  leader  as  these 
attributes  were  reflected  by  the  relevant  scales  of  the  Officer  Evaluation 
Battery  (OEB).  Other  research  has  explored  the  utility  of  certain  measures 
in  the  prediction  of  different  aspects  of  duty  performance  (Gilbert,  1977). 
While  the  research  indicated  the  utility  of  certain  of  these  measures, 
there  were  certain  differences  in  the  utility  of  these  predictors  in  the 
different  types  of  career  branches  (l.e.,  Combat  Arms,  Combat  Support,  and 
Combat  Service  Support).  These  research  efforts  then  lead  to  the  question 
of  whether  success  in  the  different  types  of  assignments  is  related  to 
different  patterns  of  aptitude,  motivation,  and  ratings  on  early  perfor¬ 
mance  on  active  duty.  If  so,  this  information  could  be  used  for  assignment 
purposes  to  Insure  that  officers  are  assigned  to  those  duty  positions  in 
which  their  aptitude  and  motivation  are  meat  fully  utilized.  This  investi¬ 
gation  was  designed  to  explore  these  considerations. 

Specifically,  the  purpose  of  this  research  was  to  explore  differences 
that  might  exist  among  groups  of  officers  who  received  higher  than  average 
ratings  in  the  different  types  of  career  branches.  These  differences  were 
evaluated  on  certain  measures  of  aptitude  and  motivation  as  well  as  perfor¬ 
mance  along  certain  leadership  dimensions  early  in  their  tour  of  active 
duty.  The  utility  of  the  variables  in  differentiating  among  these  groups 
of  officers  was  evaluated  individually  and  In  certain  combinations. 


METHOD 

A  sample  of  officers  who  attended  Officer  Basic  Courses  (OBC)  In 
thirteen  Army  Career  Branches  and  who  continued  on  active  duty  after 
completion  of  Officer  Basic  Course  were  used  as  subjects  In  the  research. 
A  weighted  average  of  the  Officer  Efficiency  Reports  received  by  these 
officers  for  the  first  three  years  of  active  duty  was  computed  and  an 
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Department  of  the  Army. 


average  of  these  weighted  averages  was  obtained.  Officers  who  received  a 
greater  than  average  weighted  average  were  selected  as  subjects.  This 
group  was  then  divided  on  the  basis  of  aeabership  in  the  th/ee  types  of 
Career  Branches:  Coabat  Aras,  Coabat  Support,  and  Coabat  Service  Support. 
The  three  groups  of  officers  were  then  compared  on  two  different  types  of 
aeasures . 

The  first  set  of  aeaBures  consisted  of  aeasures  obtained  in  Officer 
Basic  Course,  and  included  the  Officer  Evaluation  Battery,  final  course 
grades  in  Officer  Basic  Course,  and  peer  ratings  received  at  the  end  of  the 
Officer  Basic  Course.  The  Officer  Evaluation  Battery  (OEB)  reflects 
aeasures  of  aptitude  and  aotivation  in  seven  scale  scores*  The  predictive 
utility  of  this  instruaent  is  discussed  in  an  earlier  paper  (Gilbert,  1978) 
and  a  brief  description  of  the  scales  is  presented  in  Figure  1. 

The  second  set  of  aeasures  consisted  of  duty  performance  aeasures 
obtained  froa  a  specially  constructed  Performance  Evaluation  Fora  (Gilbert 
and  Grafton,  1976)  which  yields  a  global  aeasure  of  duty  performance  and 
measures  along  nine  dimensions  of  duty  perforaance.  Ratings  on  the  Perfor¬ 
mance  Evaluation  Fora  were  obtained  after  twelve  to  eighteen  months  of  duty 
performance. 

The  first  series  of  analysis  consisted  of  comparing  the  means  of 
the  three  groups  of  officers  (i.e..  Combat  Aras,  Combat  Support,  and 
Combat  Service  Support)  on  each  of  the  available  measures.  The  discrim¬ 
inant  analysis  technique  was  used  to  evaluate  the  efficacy  of  all  of  these 
variables  in  defining  group  membership.  Two  other  discriminant  analy¬ 
ses  were  performed;  one  of  these  analyses  used  only  the  Officer  Basic 
Course  measures  while  the  other  used  only  the  duty  performance  measures. 

The  second  series  of  analyses  was  performed  as  an  exploratory  strategy 
to  evaluate  what  the  effect  on  group  membership  prediction  would  be  if  only 
two  groups  of  officers  were  Involved  (i.e.,  differentiation  between  Coabat 
Arms  officers  and  officers  not  in  the  Coabat  Arms).  For  this  series  of 
analyses,  the  subjects  in  the  Coabat  support  and  in  the  Coabat  Service 
Support  were  combined.  Thus,  for  each  of  the  individual  measures,  differ¬ 
ences  between  the  mean  of  officers  in  the  Coabat  Aras  branches  aid  the  mean 
of  officers  in  the  other  branches  were  evaluated  and  a  series  of  discrim¬ 
inant  analyses  paralleling  the  three-group  design  were  performed. 


RESULTS  AND  DISCUSSION 

The  means  of  the  three  groups  of  officers  in  the  three  types  of  Career 
Branches  are  shown  in  Table  1  for  the  measures  of  aptitude  and  motivation 
of  the  Officer  Evaluation  Form  Battery  (OEB).  Significant  differences 
were  found  among  the  three  groups  on  three  scaled  of  the  OEB  at  the  .01 
level;  these  differences  were  obtained  for  the  Combat  Leadership  (Cogni¬ 
tive),  Technical  Managerial  (Non-cognitive),  and  the  Career  Potential 
(Non-cognitive)  scales.  A  statistical  difference  was  also  obtained  among 
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SUBXEST 


DESCRIPTION  OF  ITEMS 


Combat  Leadership  (Cognitive) 

Technical -Managerial  Leadership 
(Cognitive) 

Career  Potential  (Cognitive) 
Combat  Leadership  (Non-Cognitive) 

Technical -Menage rial  Leadership 
(Non -Cognit ive ) 

Career  Potential 

Career  Intent 


Military  tactics;  practical  skills 
in  a  variety  of  areas  ranging  from 
out-door  activities  to  mechanical 
and  electronic  applications' 


History,  politics;  culture;  mathe- 
aatics;  physical  sciences 

Technological  knowledge  relevant 
to  military  requirements. 

Combat  leader  qualities,  occupa¬ 
tional  interests,  sports  Interest, 
outdoor  Interests  related  to  combat 
leadership 


Mathematics  and  physical  sciences 
skills  and  Interest;  urban  or  rural 
background;  scientific  Interest  and 
ability;  decisive  leader  qualities; 
and  verbal-social  leadership 

Clerical-administrative  interest, 
versus  white  collar  Interest,  com¬ 
bat  Interest 

Intention  of  making  the  Army  a 
career  choice 


Figure  I.  Officer  Evaluation  Battery  (OEB)  Subtests  and  Description  of 
Items- 
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*  Significant  at  the  .05  level 
**  Significant  at  the  .01  level 


the  three  groups  on  the  Combat  Leadership  (Non-cognitive)  and  on  the 
Technical -Managerial  (Non-cognitive)  scales  at  the  .05  level.  The  mean 
scores  for  officers  in  the  Combat  Arms  branches  were  highest  on  the  Combat 
Leadership  (Cognitive)  scale.  Technical -Manage rial  (Cognitive)  scale 
Combat  Leadership  (Non-Cognitive)  scale  and  on  the  Career  Potential  (Non- 
Cognitive)  scale.  The  mean  performance  of  that  group  was  lowest  on  the 
Technical -Managerial  (Non-Cognitive)  scales. 

Statistically  significant  differences  were  not  obtained  among  the 
three  groups  of  officers  on  the  final  course  grades  obtained  in  Officer 
Basic  Course.  The  differences  among  the  three  groups  means  were  statis¬ 
tically  significant  at  the  .01  level  on  peer  ratings  obtained  in  Officer 
Besic  Course;  officers  in  the  Combat  Service  Support  branches  had  the 
highest  mean  on  this  variable. 

The  i  atings  of  the  three  groups  of  officers  on  the  dimensions  of  the 
Performance  Evaluation  Form  are  shown  in  Table  2.  Statistically  signifi¬ 
cant  difterences  were  obtained  among  the  group  means  on  the  Combat -Leader¬ 
ship  scale  and  the  Tactical  Knowledge  scales  at  the  .01  level  and  on  the 
Technical-Managerial  Leadership  ratings  at  the  .05  level.  The  mean  ratings 
of  officers  in  the  Combat  Arms  group  was  highest  for  Combat  Leadership  and 
for  the  Tactical  Knowledge  ratings.  Officers  in  the  Combat  Service  Support 
branches  were  favored  in  the  mean  ratings  on  the  Technical -Managerial 
scale.  The  discriminant  analysis  technique  used  to  test  the  efficacy  of 
the  variables  in  predicting  group  membership  yielded  the  classification 

data  shown  in  Table  3.  The  first  analysis  involving  all  variables  (i.e. 

Officer  Basic  Course  measures  and  measures  of  on-the-job  performance 
yielded  73.2  percent  correct  classification.  Use  of  the  Officer  Basic 
Course  variables  alone  yielded  correct  classification  of  71.6  percent. 
The  duty  performance  measures  obtained  from  the  Performance  Evaluation 
Form  also  yielded  an  index  of  correct  classification  of  69.7  percent. 

In  the  second  series  of  analyses  officers  In  the  Combat  Support 
branches  and  the  Combat  Sevice  Support  branches  were  combined  into  one 
group  and  compared  with  the  officers  in  the  Combat  Arms  groups.  In 
Table  4,  the  means  for  the  Combat  Arms  officers  and  other  officers  are 

presented  for  the  Officer  Basic  Course  (OBC)  measures.  The  mean  for  the 

Combat  Arms  group  was  significantly  higher  at  the  .01  level  on  the  Combat 
Leadership  (Cognitive)  and  the  Corcer  Potential  (Non-cognitive)  scales  of 
the  Officer  Evaluation  Battery  and  at  the  .05  level  on  the  Combat  Leader¬ 
ship  (Non-cognitive)  scale  of  that  instrument.  Officers  not  assigned  to 
the  Combat  Arms  branches  had  a  significantly  higher  (.01  level)  mean  on  the 
Technical -Managerial  (Non-cognitive)  scale.  The  mean  of  those  officers  not 
in  the  Combat  Arms  branches  was  significantly  higher  (.01  level)  for  peer 
ratings  than  was  the  mean  of  those  officers  in  the  Combat  Arms  branches. 

In  Table  5,  the  means  for  the  two  groups  of  officers  on  duty  perfor¬ 
mance  measures  are  shown.  The  mean  performance  of  officers  in  the  Combat 
Arms  branches  is  higher  at  the  .01  level  on  combat  leadership  and  tactical 
knowledge  ratings  and  at  the  .05  on  the  decision  making  ability  ratings. 
On  the  other  hand  officers  who  were  not  in  the  Combat  Arms  branches  were 
favored  in  terms  of  raea-  rating  on  the  technical  managerial  dimension. 


•Significant  at  the  .05  level 
••Significant  at  the  .01  level 


Table  3 


Three  Group  Classification 


All  Variables: 

Predicted 

Actual 

Combat 

Arms 

Combat 

Support 

Combat 

Service 

Support 

Combat  Arms 

94.4 

3.7 

1.9 

Combat  Support 

69.8 

19.4 

10.8 

Combat  Service 

Support 

30.9 

20.6 

48.5 

Percent 

of  Cases 

Correctly 

Classified  • 

73.2 

Officer  Basic  Course  Variables: 

Predicted 

Actual 

Combat 

Arms 

Combat 

Support 

Combat 

Service 

Support 

Combat  Arms 

97.9 

0.7 

1.4 

Combat  Support 

79.1 

12.2 

8.6 

Combat  Service 

Support 

61.8 

11.8 

26.5 

Percent 

of  Cases 

Correctly  Classified  ■ 

71.6 

Duty  Performance  Measures: 

Predicted 

Actual 

Combat 

Arms 

Combat 

Support 

Combat 

Service 

Support 

Combat  Arms 

96.3 

.9 

2.8 

Combat  Support 

89.2 

2.9 

7.9 

Combat  Service 

Support 

58.8 

2.9 

38.2 

Percent  of  Cases  Correctly  Classified  ■  69.7 
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Final  Course  Peer  Ratings  99.14  106.94  101.67  26.33** 


‘Significant  at  the  .05  level 
“Significant  at  the  .01  level 


"’S 


In  Table  6  the  classification  of  the  two  groups  of  officers  using 
the  discriminant  analysis  technique  is  shown.  When  all  of  the  variables 
were  employed  (i.e.,  Officer  Basic  Course  measures  and  measures  of  duty 
performance),  80  percent  correct  classification  resulted.  The  use  of  the 
Officer  Basic  Course  aeasures  alone  resulted  in  76.1  percent  correct 
classification  while  use  of  the  duty  performance  dimensions  resulted  in 
73.9  percent  correct  classification. 

The  results  of  this  research  Indicate  that  officers  in  the  Combat 
Arms  branches  who  receive  higher  than  average  Officer  Efficiency  Report 
ratings  displayed  a  greater  aptitude  for  success  in  combat  type  situations 
than  did  the  other  officers.  This  is  reflected  in  their  scores  on  the 
Combat  Leadership  (Cognitive)  scale  of  the  Officer  Evaluation.  However, 
officers  in  the  Combat  Arms  branches  also  score  higher  on  those  aptitudes 
necessary  for  a  staff  or  technical  position  as  measured  by  the  Technical* 
Managerial  scale  of  the  same  Instrument  but  the  differences  are  not  as 
pronounced.  When  these  officers  were  compared  to  all  other  officers  on 
this  measure,  a  statistically  significant  difference  was  not  obtained.  The 
group  of  higher  rated  officers  also  displayed  a  greater  interest  in  a 
career  as  an  Army  officer  as  measured  by  the  Career  Potential  scale  of  the 
Officer  Evaluation  Battery  and  interest  in  those  activities  related  to 
success  as  a  combat  leader  as  measured  by  the  Combat  Leadership  (Non* 
Cognitive)  scale  of  this  instrument.  Interest  in  activities  related  to 
staff  and  technical  types  of  activities  characterized  those  officers  who 
were  not  in  the  Combat  Arms  branches.  Officers  who  received  higher  than 
average  Officer  Efficiency  Report  ratings  were  rated  higher  than  other 
officers  early  in  their  active  duty  tour  on  combat  leadership  and  tactical 
knowledge  while  other  officers  were  rated  higher  on  technical -managerial 
leadership. 

The  measures  used  in  this  research  were  of  utility  in  identifying 
officers  who  received  higher  than  average  Officer  Efficiency  Report  ratings 
in  the  Combat  Arms  from  other  officers  who  received  higher  ratings  in  other 
branches.  Future  research  will  be  directed  coward  exploring  the  relative 
merits  of  the  Indices  used  in  this  exploratory  research  in  differentiating 
among  officers  who  are  highly  rated  in  more  specific  officer  specialties. 
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Table  6 


Two  Group  Classification 

All  Variables; 

Predicted 


Combat 

Actual 

Arms 

Others 

Combat  Arms 

92.3 

7.7 

Others 

48.8 

51.2 

Percent  of  Cases  Correctly  Classified 

-  79.0 

Officer  Basic  Course  Variables : 


Predicted 


Combat 

Actual 

Arms 

Others 

Combat  Arms 

93.0 

7.0 

Others 

58.9 

41.1 

Percent  of  Cases  Correctly  Classified 

-  76.0 

Predicted 


Combat 

Actual 

Arms 

Others 

Combat  Arms 

91.4 

8.6 

Others 

62.3 

37.7 

Percent  of  (lies  Correctly  Classified  -  73.9 
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V  , 

The  field  of  occupational  analysis  benefits  from  the  use  of  CODAP  which 
is  a  computerized  approach  for  processing  task  inventory  data.  CODAP  is  an 
acronym  for  Comprehensive  Occupational  Data  Analysis  Programs  and  produces 
printouts  which  job  analysts  have  found  invaluable  in  understanding  and  inter¬ 
preting  work  performed  by  individuals  in  a  wide  range  of  career  fields.  The 
recent  development  of  CODAP  System  80  greatly  expands  the  job  analysts'  ability 
to  process  task  data.  The  new  features  of  System  80  overcome  certain  restric¬ 
tions  and  limitations  inherent  in  the  older  version  of  CODAP.  One  restriction 
limits  the  number  of  task  modules  (duty  fields)  in  a  study  to  26,  and  once  the 
task  modules  are  defined  and  categorized,  they  can  not  be  changed.  CODAP 
System  80  permits  wide  flexibility  in  defining  the  modules  for  which  tasks 
are  to  be  grouped.  This  paper  outlines  the  need  to  re-examine  the  process  by 
which  we  define  duty  fields  for  occupational  analysis.  Traditional  methods 
hinder  job  analysts  ability  to  make  accurate  interpretations  of  data  and  cause 
job  analysis  to  spend  work  time  on  unproductive  tasks, 
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The  field  of  occupational  analysis  is  now  two  decades  old  and  has 
experienced  many  changes,  developments,  and  improvements  in  the  quality  of 
task  inventory  content  and  the  computer  software  for  processing  task  inventory 
data.  Occupational  analysis  is  the  official  procedure  by  which  all  branches 
of  the  US  Armed  Services  collect  and  analyze  job  and  task  information  to  guide 
decision  makers  in  developing  training,  testing  and  occupational  classifica¬ 
tions.  The  adoption  of  the  task  inventory  approach  by  local,  state  and  federal 
agencies,  power  utilities  and  private  corporations  is  increasing  at  a  rapid 
rate.  This  is  because  the  occupational  analysis  process  meets  the  requirements 
defined  by  the  federal  uniform  guidelines  for  conducting  job  analysis  prior  to 
developing  personnel  selection  exams.  Occupational  analysis  is  the  most  widely 
used  method  for  developing  job  analytic  data  of  all  major  job  analysis  methods. 

The  foundation  for  occupational  analysis  was  laid  in  the  1960's  through 
completion  of  research  studies  which  contributed  to  the  development  of  basic 
methodologies  for  collecting  and  analyzing  task  inventory  data.  In  1960, 

McCormick  and  Arir.eman  published  their  report  entitled  "Development  of  Work 
Activity  Check  Lists  for  Use  in  Occupational  Analysis"  and  established  the 
utility  in  partitioning  jobs  into  task  statements  for  collecting  job  information 
directly  from  incumbent  workers.  As  a  result,  the  US  Air  Force,  which  sponsored 
this  study,  incorporated  the  list  of  task  activities  into  a  comprehensive  job 
survey  instrument  known  as  a  job  or  task  inventory.  In  tine,  task  inventories 
for  many  occupational  fields  grew  to  contain  400  to  600  tasks  thus  presenting 
to  the  occupational  analyst  very  large  amounts  of  detailed  data  for  analysis. 
Consequently,  a  way  was  needed  by  which  summary  statistics  could  be  developed 
for  oroups  of  task's  known  as  duty  fields.  Mayo  published  a  report  in  1969 
entitled  "Three  Studies  of  Job  Inventory  Procedures:  Selecting  Duty  Categories, 
Interviewing,  and  Sampling."  In  this  study,  Mayo  evaluated  "alternative  varia¬ 
tions  in  categorization  of  tasks"  and  developed  recommended  duty  categories  for 
grouping  task  statements.  The  report  suggested  three  stable  formats  for  organiz¬ 
ing  non-superviscry  tasks.  These  include  the  work  section,  work  function,  and 
equipment  use  formats.  In  regaras  to  the  work  section  format,  Mayo  recommended 
tasks  be  arranged  in  groups  parallel  to  the  work  sections  of  organizational 
charts.  Accounting  and  finance  may  typically  be  divided  into  travel ,  pay 
accounts,  control,  collecting  ana  stock  funds  sections.  Hence  tasks  could  be 
grouped  according  to  these  sections. 

Using  the  work  function  format,  an  inventory  developer  should  create 
groupings  'of  tasks  according  to  major  work  functions  such  as  equipment  mainten¬ 
ance.  This  is  mnst  appropriate  when  one  piece  of  equipment  such  as  a  helicopter 
is  the  focus  of  work.  Typical  duty  Helds'  headings  should  include  inspection, 
troubleshooting,  adjusting  and  repairing  equipment  items.  In  this  manner,  tasks 
are  grouped  bccau'.e  of  some  cohesive  comer  denominator  relative  to  work  function. 
In  the  equipment  use  format  workers  are  responsible  for  maintenance  of  different 
types  of  equipment  and  Va.n  recn:'me*1ds  grouping  tasks  under  duty  titles  such 
as  "maintaining  generators",  '-a  iota-ring  intake  and  exhaust  systems"  and  "main¬ 
taining  diesel  »■ ni”es".  ee-m?  toe  work  of  an  individual  nay  encompass  activities 
associated  v,i  +  H  sever.!!  ♦.per,  **  equipment. 
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The  study  did  nut  address  how  the  number  of  tasks  in  a  duty  field,  or  a 
task  module  as  it  is  called  in  CCDAP  System  80,  would  affect  the  analysis  of 
data  and  the  wor 1  lead  for  the  job  analysts.  Often  task  inventories  are  develop 
ed  by  a  group  with  no  responsibility  for  analyzing  data.  When  this  occurs, 
problems  usually  result  since  a  duty  field  containing  a  high  number  of  tasks 
can  present  problems  in  analysis  and  interpretation  for  the  job  analysts. 
Typically  these  problems  accrue  because  the  task  inventory  developer  failed 
to  follow  conventional  guidelines  and  grouped  tasks  under  broad  duty  fields 
containing  tasks  .'hicn  in  reality  represent  diverse  work  sections  of  an  organi¬ 
zation,  work  functions  or  equipment  use.  A  review  of  these  duty  fields  usually 
reveals  that  tasks  could  be  further  divided  into  smaller  duty  fields  bringing 
mere  precision  tr-  the  analysis  and  decreased  mental  strain  to  the  analyst's 

work . 


The  followirg  c-xamole  illustrates  the  need  for  revised  thinking  about 
hew  we  group  tas>s  intc  dut,  fields.  Figure  1  contains  a  duty  field  entitled 
"Performing  Systems  Design  Functions"  and  reflects  the  “functional  approach" 
for  categorizing  tasks  into  duty  fields.  The  study  was  conducted  with  all 
twenty-six  tasks  in  the  one  duty  field.  A  close  examination  of  the  individual 
tasks  reveals  the  tasks  can  be  further  subdivided  into  two  groups.  Figure  2 
reveal-  that  one  group  of  tasks  relates  to  the  development  of  system  procedures 
and  spi  ci f ications  for  a  proposed  system.  The  second  group  defines  the  types 
of  tas-s  involved  m  working  and  interacting  with  programmers  and  other  systems 
analysts  in  designing  a  nc-w  system.  It  is  apparent  that  office  study  and 
research  to  develop  the  procedures  and  specifications  for  a  new  system  requires 
a  very  different  set  of  behaviors  than  nanaging  and  directing  other  staff 
members.  The  extent  to  which  a  systems  analyst  would  perform  tasks  in  either 
of  the  two  groups  could  be  determined  in  the  analysis  of  data  in  one  of  two 
ways.  If  the  duty  field  categorization  had  been  defined  and  titled  to  reflect 
these  two  groups  then  the  duty  level  job  descriptions  would  report  workers 
performance  regarding  each  set  of  tasks.  If  all  tasks  are  grouped  under  one 
category  then  the  analyst  must  work  through  the  task  level  job  description  and 
locate  and  separate  the  different  sets  of  tasks.  This  is  not  an  enjoyable 
process  as  any  job  analyst  knows. 

There  is  clearly  a  need  for  a  revised  look  at  how  we  classify  tasks  into 
duty  categories  and  how  this  classification  affects  the  work  of  job  analysts 
and  the  accuracy  of  their  analysis.  In  many  cases,  task  inventory  developers 
are  unaware  of  "the  difficulties  created  by  random  methods  for  categorizing 
duties.  By  using  CODAP  System  80,  we  are  no  longer  tied  to  using  one  approach 
for  categori z ing  tasks  and  assigning  them  to  duty  fields.  In  fact,  it  is  often 
desirable  to  develop  different  duty  categories  based  upon  planned  use  of  data. 
Our  work  has  shown  that  classi f ication  and  training  each  require  separate 
schemes  for  assigning  tasks  to  duty  fields.  As  revealed  by  the  example,  it  is 
also  desirable  to  construct  duty  fields  containing  tasks  representing  more  ex¬ 
plicit  groups  of  behavior.  This  would  be  helpful  in  classification  where 
duty  level  job  descriptions  pinpoint  the  nature  of  work  speciality  more  pre¬ 
cisely.  Such  cannot  be  readily  determined  using  duty  fields  containing  tasks 
representing  diverse  types  of  behavior,  more  than  one  work  section,  or  work 
function.  In  regards  to  training,  curricula  for  a  training  program  are  often 
divided  into  training  modules  that  represent  various  categories  of  instruction. 
Approaches  in  training  evaluation  require  that  each  task  in  an  inventory  be 
assigned  to  a  module  in  which  the  training  of  the  module  prepares  a  worker 
to  perform  the  task.  In  this  situation,  long  training  programs  may  encompass 
a  large  number  of  modules  and  require  that  tasks  be  apportioned  accordingly. 
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These  are  but  a  few  of  the  situations  that  demand  we  take  a  fresh  look 
at  how  we  define  duty  categories  for  task  inventories.  The  work  of  occupational 
analysts  in  the  1980's  will  require  more  efficient  and  effective  methods  for 
processing  and  analyzing  task  inventory  data  to  meet  user  needs. 


Figure  1 

DESIGNING  SYSTEMS 


ADVISE  PROGRAMMERS  AND  SYSTEMS  ANALYSTS  ON  DIVISION'S  POLICY  AND  PROCEDURES 
FOR  DATA  PROCESSING. 

ANALYZE  AND  DETERMINE  THE  TYPE  OF  FORMS  REQUIRED  TO  COLLECT  DATA. 

ANALYZE  THE  LOGIC  FUNCTIONS  REQUIRED  IN  THE  SPECIFICATIONS  FOR  URITING  PROGRAMS. 
ASSIGN  ANALYSTS  TO  PREPARE  A  PROPOSAL  FOR  PRESENTATION  TO  A  REVIEW  BOARD. 

ASSIGN  PERSONNEL  TO  PROJECT  QUALITY  REVIEW  BOARDS  (QRBS) . 

ASSIGN  PERSONNEL  TO  PROJECTS. 

ASSIGN  SYSTEMS  AND  PROGRAMMING  PROJECTS  TO  DATA  PROCESSING  SUPPORT  TEAMS. 

ASSIGN  TASKS  TO  PROGRAMMER. 

CONDUCT  HARDWARE  CAPACITY  PLANNING  REVIEWS. 

DEFINE  THE  FUNCTIONS  TO  BE  ASSIGNED  TO  PROGRAMMERS. 

DETERMINE  THE  LEVEL  OF  PERSONNEL  NECESSARY  TO  DEVELOP  THE  SYSTEM. 

DETERMINE  THE  TYPES  CF  EQUIPMENT  REQUIRED  TO  SUPPORT  A  SYSTEM. 

DETERMINE  THE  TYPES  OF  PROGRAMS  AND  ROUTINES  THAT  WILL  BE  REQUIRED  TO  DEVELOP 
A  SYSTEM. 

DETERMINE  TIME  ESTIMATES  TO  COMPLETE  TASKS. 

DEVELOP  WORK  SCHEDULES  FOR  PROJECT. 

PARTICIPATE  IN  WALKTHROUGHS  AND  REVIEWS  OF  PROJECTS  AND/OR  SY5TEMS. 

PREPARE  WORKPLANS  FOR  PROJECT. 

REVIEW  A  PROPOSED  PROJECT  WPH  SYSTEMS  ANALYST  AND  USER  7C  DEFINE  THE  SCOPE 
OF  THE  PROJECT. 

REVIEW  AND  EVALUATE  DATA  AND  INFORMATION  COLLECTION  PROCEDURES. 

REVIEW  AND  EVALUATE  SPECIFICATIONS  FOR  A  PROPOSED  SYSTEM. 

REVIEW  FINDINGS  OF  FEASIBILITY  STUDIES  TO  DETERMINE  APPROPRIATE  ACTIONS. 

REVIEW  MANUALS  FOR  PROPOSED  SOFTWARE. 

REVIEW  PROJECT  SCHEDULES  AND  ESTABLISH  PRIORITIES  FOR  PROJECT  START-UPS. 

REVIEW  PROPOSED  STAND,  .S  FOR  SOFTWARE. 

REVIEW  SYSTEMS  DESIGN  SPECIFICATION  TO  ASSESS  f'DEQUACY  OF  APPROACH. 

REVIEW  SYSTEMS  DF7EI OPMENT  PLANS  TO  DETERMINE  WHAT  ADDITIONAL  STAFF  AND 
RESOURCES  ARE  N: EFSSARY. 


Figure  2 

DESIGNING  SYSTEMS  (PROJECT  MANAGEMENT) 

ADVISE  PROGRAMMERS  AND  SYSTEMS  ANALYSTS  ON  DIVISION’S  POlICY  AND  PROCEDURES 
FOR  DATA  PROCESSING. 

ASSIGN  ANALYSTS  TO  PREPARE  A  PRO, DSAL  FOR  PERSENTATION  TO  A  REVIEW  BOARD. 
ASSIGN  PERSONNEL  TO  PROJECT  QUALITY  REVIEW  BOARDS  (QRBS). 

ASSIGN  PERSONNEL  TO  PROJECTS. 

ASSIGN  SYSTEMS  AND  PROGRAMMING  PROJECTS  TO  DATA  PROCESSING  SUPPORT  TEAMS. 
ASSIGN  TASKS  TO  PROGRAMMER. 

DEFINE  THE  FUNCTIONS  TO  BE  ASSIGNED  TO  PROGRAMMERS. 

DETERMINE  THE  LEVEL  OF  PERSONNEL  NECESSARY  TO  DEVELOP  THE  SYSTEM. 

DETERMINE  TIME  ESTIMATES  TO  COMPLETE  TASKS. 

DEVELOP  WORK  SCHEDULES  FOR  PROJECT. 

PARTICIPATE  IN  WALKTHROUGHS  AND  REVIEWS  OF  PROJECTS  AND/OR  SYSTEMS. 

REVIEW  A  PROPOSED  PROJECT  WITH  SYSTEMS  ANALYST  AND  USER  TO  DEFINE  THE  SCOPE 
OF  THE  PROJECT. 

REVIEW  SYSTEMS  PE VELOPMENT  PLANS  TO  DETERMINE  WHAT  ADDITIONAL  STAFF  AND 
RESOURCES  ARE  NECESSARY. 


DESIGNING  SYSTEMS  (SYSTEM  PROCEDURES) 

ANALY/E  AND  DETERMINE  THE  TYPE  OF  FORMS  REQUIRED  TO  COLLECT  DATA. 

ANAIYZE  THE  LOGIC  FUNCTIONS  REQUIRED  IN  THE  SPECIFICATIONS  FOR  WRITING  PROGRAMS. 

CONDUCT  HARD,, ARE  CAPACITY  PLANNING  REVIEWS. 

DETERMINE  T'iF  T-ITS  OF  EQUI  CENT  REQUIRED  TO  SUPPORT  A  SYSTEM. 

DETERMINE  >j  ■■it  -F  PROGRAM.,  AND  ROUTINES  TUAT  WILL  BE  REQUIRED  TO  DEVELOP 
A  S>ST.L‘*. 


PREPARE  WORM  ;  A 
PE  VIEW  AND  r\A; 
REVIEW  AND  E . A. 
p-  ;  ini.;-, . 


d  pp'Njr'T. 

[A  A  AND  i NEO-MAT ! ".‘I  C •’ELECTION  PROCEDURES. 

'.prr’f  I C-*.T ! ON '  FOP  A  F ROPOSLD  SYSTEM, 
i  A«,!!UI  •*!  L T ' V, i E  ’  T^  DETERMINE  APPROPRIATE  ACTIONS. 
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PROJECT  START- UPS . 
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Computerized  Optimal  Weight  Scoring  (COWS):  A  Comparison  of  Three 
Procedures 


This  paper  documents  the  results  of  a  simulation  study  comparing 
three  computerized  optimal  weight  scoring  (COWS)  procedures  based  on 
item  response  theory,  with  conventional  scoring.  The  effect  of 
the  COWS  procedures  is  to  differentially  weight  test  items  as  a  func¬ 
tion  of  examinee  ability  and  the  item  characteristics.  Low  ability 
examinees  are  given  very  low  weights  on  difficult  test  items,  lowering 
the  effects  of  guessing,  and  decreasing  test  error.  . 
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A  MONTE  CARLO  COMPARISON  OF  THREE  OPTIMAL  TEST  SCORING  PROCEDURES 


Steven  Gorman 
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Differentially  weighting  items  to  improve  the  psychometric  properties  of 
multiple-choice  aptitude  or  achievement  test  scores  has  long  "been  of  theoretical 
and  practical  concern.  Applications  of  item  weighting  procedures  which  were 
empirically  developed  or  based  on  classical  test  theory  did  not  generate  the 
desired  psychometric  properties  (Bejar  5  Weiss,  197Z;  Downey,  1976). 

Lord  (1968)  successfully  applied  item  weighting  based  on  the  three  parameter 
logistic  model  developed  by  Birnbaum  (1968)  to  the  Verbal  Scholastic  Aptitude 
Test  (V SAT).  The  effect  of  this  procedure  is  to  differentially  weight  test 
items  as  a  function  of  examinee  ability  on  the  basis  of  their  item 
discriminatory  power,  difficulty,  and  susceptibility  to  guessing.  Accordingly, 
low  ability  examinees  will  be  given  very  low  weights  on  difficult  test  items, 
thus  lowering  the  effects  of  guessing,  and  decreasing  test  error.  The  use  of 
optimal  item  weights  eliminates  more  test  score  error  at  low  ability  ranges, 
where  guessing  on  difficult  items  is  more  prevalent.  Because  of  the  difficulty 
(until  recently)  of  accurately  estimating  these  item  parameters,  and  the 
computations  involved  in  using  Lord's  procedure,  this  optimal  item  weighting 
method  has  not  been  widely  adopted.  The  accurate  estimation  of  the  three 
parameters  of  test  items  has  become  commonplace  with  the  development  of  several 
new  computer  programs,  namely  LOGLST  (Wood,  Wingersky,  %  Lord,  1976),  ANCTLLES 
(CrcLL  A  Urry,  in  preparation)  and  FIT7TP  (Gugel,  in  preparation).  Three 
FORTRAN  computer  programs,  MAXLIKE,  BAZEMODAL,  and  OWENSTAT  have  been  developed 
to  estimate  ability  based  on  the  use  of  optimal  scoring  weights. 

The  present  study  simulates  a  live  testing  situation  by  introducing  item 
parameter  estimation  error.  It  compares  the  performance  of  conventional  unit 
weight  scoring  with  three  optimal  item  weighting  methods  using  the  item 
parameters  estimated  by  ANCTLLES.  The  data  in  this  investigation  were  produced 
with  a  Monte  Carlo  procedure  which  generated  e.  aminee  item  responses. 

TECHNIQUES  INVESTIGATED 


The  psychometric  characteristics  of  test  scoies  for  three  optimal  item  weight 
scoring  procedures,  and  a  unit  weighting  scoring  method  were  investigated  using 
three  idealized  types  of  test  distributions. 


The  three  optimal  scoring  procedures  are  based  on  Birnbaum's  three  parameter 
logistic  model  which  states  x h a t  the  probability  of  a  correct  response  given  an 
ability  level  is: 

P-V  =  1  I ©1  -  c.  *  '"1  -  e.lrl  +  exp'-1 .7a.  (Q  -  b.))1_1  'll 
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...  .  Hr  item  1  iscr  mi  nntorv  powei  •  br  ts  the  item  difficulty,  and  c*  is 

the  *»em  ’oe^'icient  of  guessing. 
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aeo r in-*  oroc«dures ,  ni  a  conventional  jcwring  procedure 
r  :  bed  in  the  following  paragraphs. 


Owen  Bayesian  Scoring 


This  method  is  described  in  German  (1979)  and  based  on  Owen  (1975).  The 
method  uses  the  Owen  algorithm  scoring  procedure  in  a  sequential  rather  than 
adaptive  mode,  in  the  computer  program  OWENSTAT. 

Bayes  Modal  Scoring 

This  procedure  is  based  on  the  work  on  Samejima  (1969)  and  assumes  that 
ability  is  normally  distributed.  The  ability  estimate  is  the  maximum  value  for: 

B  (0)  =  d(0)  FUi(9)QUi(Q)  (2) 

V  i=1 

where  N(S)  is  the  normal  Gaussian  distribution,  Pui(9)  is  the  probability  of  a 
correct  response  to  item  i  given  ability  9,  and  QUi(9)  =  1-Pui(Q). 

Maximum  Likelihood  Scoring 


This  procedure,  described  in  Lord  &  Novick  (1968),  states  that  the  ability 

estimate  is  that  value  which  maximizes  I  (9)  in: 

v 

L  (9)  =tT  PUi(9)0l'i(9)  (5) 

V  i  =  1 

where  L  (9)  is  the  maximum  likelihood  ability  estimate, 
v 


"Tie  above  two  scoring  equations  were  solved  in  the  computer  programs 
SAZEMODAL  and  MAXLIKE  by  use  of  a  modified  Newton-Raphson  algorithm. 

Z-Score  Transformation 

This  is  a  conventional  unit  weighting  scoring  procedure  where: 

Z=  X-X  (4) 

S.D. 

where  X  is  the  examinee's  raw  test  3core,  X  is  the  average  raw  test  score,  S.D, 
is  the  standard  deviation  of  raw  scores,  and  Z  is  the  ability  estimate. 


METHODOLOGY 


Ideal  tests  were  constructed  consisting  of  two  levels  of  item  quality,  low 
(a  =  0.3)  and  high  (a.  =  1.6).  Three  test  types  were  developed  which  varied 
the  item  difficulty  fb  )  distributions  in  order  to  provide  maximum  test 
information  (Birnbaum,  1  063l  at: 

(l  1  evenly  distributed  values  over  the  ability  range  9  “  -2.3  to  +2.S 

( rectangular ) 

(?)  midpoints  of  even-s’ •'•od  areas  of  the  Gaussian  distribution  (normal)  and 
(Vi  at  the  ability  mean  (peaked). 

The  suscertibility  to  guessing,  c ;  was  set  for  all  test  items  at  .13,  a 
reasonable  average  for  a  five  alternative  multiple  choice  test  (based  on 

•Tenserr.a,  1976';. 
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Data  were  generated  in  accordance  with  Biinbaum's  (1968)  three  parameter 
logistic  model  (equation  1  )  in  two  parts: 

(1)  creating  examinee  responses  for  the  estimation  of  item 
parameters,  and 

(2)  creating  examinee  responses  to  all  test  items  with  known  item 
parameters. 

When  the  item  parameters  are  specified,  the  probability  of  a  correct  response 
is  strictly  a  function  of  the  simulated  examinee's  (aims)  ability.  To  generate 
dichotomous  responses,  the  probability  of  a  correct  response  is  determined  for 
the  examinee's  ability  from  equation  1.  If  this  probability  is  greater  than  or 
equal  to  a  random  number  between  0  and  1,  then  a  correct  response  is  generated, 
else  an  incorrect  response  is  generated.  The  item  parameters  for  these  items 
along  with  items  from  a  separate  study  (Gorman,  1980)  were  then  estimated  by 
ANCILL5S  based  upon  the  administration  of  tests  of  81  item  length  to  2000  sims 
representing  a  normal  population.  These  estimated  item  parameters  were  used  in 
the  three  optimal  scoring  procedures. 

STUDY  1 

In  Study  1,  a  group  of  800  normally  distributed  sims,  generated  by  the 
IIRANDOW  computer  program  (learmonth  5  lewis,  1978),  were  administered  all 
tests.  The  criterion  evaluated  in  Study  1  was  the  fidelity  coefficient,  the 
correlation  of  known  ability  with  ability  estimated  by  each  scoring  method  on 
the  20  and  80  item  testa.  This  statistic  has  been  widely  used  in  other  testing 
research.  Samejima  (1977)  decries  the  use  of  this  statistic,  stating  that  since 
it  is  a  correlation  coefficient,  it  is  dependent  not  only  upon  the  test,  but 
also  upon  the  specific  group  of  examinees  tested.  Testing  the  same  sims  from  a 
normally  distributed  population  will  hold  constant  across  test  scoring  methods 
the  effect  of  the  ability  distribution  of  examinees.  It  should  be  demonstrable 
that  a  theoretical  test  can  have  a  high  fidelity  coefficient,  yet  have  poorer 
psychometric  properties  as  a  function  of  ability  than  other  theoretical  tests. 

RESULTS 

The  results  of  Study  1  are  displayed  in  Table  1.  The  fidelity  coefficients, 
the  correlations  between  known  and  estimated  ability,  are  typically  greater  for 
tests  of  all  types,  lengths,  and  item  qualities  scored  with  the  optimal  scoring 
methods.  The  only  exception  is  with  the  peaked  high  item  quality  tests,  where 
the  conventional  scoring  procedure  ordered  examinees  better,  c.i  average,  than 
with  the  maximum  likelihood  procedure.  On  the  low  item  quality  tests,  the 
ordering  of  examinees  is  greatest  with  the  peaked  tests.  On  the  high  item 
quality  test,  the  normal  tests  ordered  examinees  best.  Note  that  the  20  item 
normal  test  with  higher  item  quality  scored  with  any  of  the  optimal  scoring 
procedures  has  higher  fidelity  coefficients  than  the  80  item  conventionally 
scored  test.  Also  note  that  as  test  length  and  item  quality  increase,  the 
fidelity  coefficients  of  the  optimal  scoring  procedures  become  much  greater  than 
those  of  the  conventional  method. 


TABLE  1 


Correlations  between  Known  Ability  and  Ability  Estimated  by  Pour 
Scoring  Methods  on  Three  Types  of  Test  Distributions* 

20  ITEM  TEST 


Low  High  a^ 


Test  Type 

R 

P 

N 

R 

P 

N 

Method 

BAZEMODAL 

842 

891 

877 

927 

906 

943 

MAXIIKE 

859 

388 

874 

926 

876 

941 

OWENSTAT 

840 

892 

875 

922 

911 

959 

Z-SCORE 

828 

888 

869 

908 

904 

929 

50 

ITEM  TEST 

BAZEMODAI 

907 

925 

925 

949 

926 

954 

MAXLIKE 

90S 

921 

925 

949 

891 

954 

OWENSTAT 

905 

924 

922 

944 

924 

949 

Z-SCORE 

894 

919 

918 

955 

912 

933 

•Decimals  omitted 


DISCUSSION 

With  low  item  quality,  the  20  item  peaked  test  and  the  "50  item  peaked  and 
normal  tests  provided  the  greatest  fidelity  coefficients,  the  correlation 
between  the  ability  estimates  and  known  ability.  With  high  item  quality,  the 
normal  tests  have  the  greatest  fidelity  coefficient.  This  follows  from  item 
response  theory,  which  holds  that  item  information  becomes  more  leptokurtic  as 
item  discriminatory  power  (a^)  increases.  Thus,  since  item  information  is 

additive,  a  peaked  test  with  low  a.  values  should  differentiate  examinees  over  a 
much  broader  range  than  a  peaked  test  with  high  a^  values. 

This  study  has  shown  that  on  this  one  criterion,  the  non-rectangular  tests 
are  better  average  measures  of  ability.  Study  2  wi1!  examine  two  other 

psychometric  properties  of  these  test  scoring  methods, 

STUDY  2 

In  Study  2,  the  sample  consisted  of  100  aims  at  each  of  11  evenly  spaced 
ability  values  on  the  ability  continuum  -2.5  to  +2-5.  The  instruments  used  were 
the  three  types  of  50  item  tests.  These  aims  provided  data  to  compute 

statistics  as  &  function  of  examinee  ability.  The  criteria  evaluated  are  score 
bias  and  test  score  precision.  Test  score  bias  is  the  average  difference 

between  the  known  examinee  ability  and  the  ability  estimated  by  each  scoring 
method.  Test  score  precision  is  given  by  the  test  score  information  value,  an 
indicator  of  the  usefulness  of  the  test  scores  for  differentiating  ability  at 
that  ability  level.  Test  score  information  is  inversely  related  to  the  square 
of  the  standard  errox  of  the  ability  estimate,  and  varies  as  s  function  of 
abil ity. 
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RESULTS 


1 


The  mean  difference  between  known  ability  and  the  ability  estimated  by  the 
four  procedures  was  computed  for  the  100  sims  at  each  of  the  11  ability  levels. 
For  ease  of  comparison,  these  data  were  aggregated  by  computing  an  average  of 
the  absolute  values  of  the  11  score  bias  figures.  These  values  are  located  in 
Table  2.  The  data  show  that  the  bias  is  lower  (excepting  the  Bayes  modal  scored 
peaked  test  of  low  item  quality)  for  all  optimal  scoring  methods  for  all  test 
types  and  levels  of  item  quality  than  with  conventional  scoring.  The  bias  is 
most  severe  with  the  peaked  test,  and  this  bias  increases  with  higher  item 
discriminatory  power. 

TABLE  2 


Average  Absolute  Value  of  Score  Bias  for  Ability  Estimated 
by  Four  Scoring  Methods  on  Three  Types  of  Test  Distributions* 


Low  a.  High  a, 

i  ”  i 


Test  Type 

R 

P 

N 

R 

P 

_ N 

Method 

BAZEMODAI 

Id 

27 

18 

13 

38 

14 

MAXLIKE 

1 1 

17 

09 

12 

25 

12 

OVENSTAT 

11 

17 

10 

15 

30 

14 

Z-SCORE 

20 

26 

21 

13 

44 

22 

•Decimals  omitted 


Two  criteria  need  to  be  measured  when  evaluating  test  score  precision.  One 
is  the  average  precision  (computed  by  estimated  test  score  information),  and  the 
other  is  the  equiprecision  over  the  ability  range.  Since  the  items  for  all 
types  of  test  distributions  have  the  same  item  discriminatory  power  and 
susceptibility  to  guessing,  all  tests  share  identical  potential  test  score 
information.  The  test  score  information  could  only  be  measured  over  the  range 
-2.0  to  +2.0  with  this  research  design,  thus  some  of  the  test  score  information 
for  the  non-peaked  tests  is  not  measured.  Therefore,  average  precision,  listed 
in  Table  3,  should  only  be  reviewed  within  test  types.  For  the  tests  consisting 
of  items  with  low  item  discriminatory  power,  all  scoring  procedures  yielded 
roughly  the  same  average  information.  However,  with  the  tests  of  higher  quality 
items,  the  optimal  scoring  procedures  provided  greater  information  than  with 
conventional  scoring,  except  with  the  peaked  test. 

TABLE  1 


Average  Test 

Score 

Information  for  Tour  Scor 

ing 

Methods  on 

Three 

Types  of 

Test 

Distributions 

Low  s , 
i 

High 

*  . 

1 

Test  'rype 

R 

p 

M 

R 

P 

N 

Method 

3A3SM0DAI 

4.17 

4.98 

9.85 

10.72 

12.47 

MflVT  TV X? 

4,78 

5  *  nO 

4.81 

Q.62 

1 1 .04 

12.61 

"5WKNS7AT 

4.40 

4.97 

9.12 

10.45 

12.08 

4.16 

5.7Q 

4.86 

7.44 

12.08 

10.25 

A96 


The  equiprecision  of  the  tests  by  scoring  method  is  measured  by  the 
coefficient  of  variation  (CV)  of  test  score  information,  listed  in  Table  4.  The 
greater  this  CV  value,  the  les3  equiprecise  the  test  score  information.  The 
peaked  tests  provide  the  greatest  CV  values,  with  the  optimal  scoring  procedures 
yielding  greater  values  than  with  conventional  scoring.  The  peakedness  of 
information  increases,  as  expected,  with  the  higher  item  discriminatory  power 
items.  The  rectangular  tests  give  more  even  test  score  precision,  with  thi3 
phenomenon  more  evident  on  the  higher  item  discriminatory  power  test.  Although 
the  conventional  scoring  of  the  rectangular  high  item  quality  test  has  lower  CV 
values  than  with  optimal  scoring,  the  three  optimal  scoring  methods  all  yielded 
greater  information  at  all  nine  ability  levels  than  with  conventional  scoring. 

TABLE  4 

Coefficient  of  Variation  of  Test  Score  Information  for  Four 
Scoring  Methods  on  Three  Types  of  Test  Distributions 


Low  a^  High 


Test  Type 

ft 

P 

N 

R 

P 

N 

Method 

BAZEMODAL 

24 

59 

25 

29 

125 

55 

MAXLIKE 

25 

6? 

29 

19 

156 

55 

OVENSTAT 

27 

60 

24 

22 

128 

49 

Z-3C0RE 

29 

52 

56 

16 

94 

52 

DISCUSSION 

These  optimal  scoring  procedures  weight  items  as  a  function  of  their  item 
information  (Birnbaum,  1968),  the  contribution  of  each  item  to  decrease  test 
score  error  at  each  ability  level.  The  item  information  is  a  function  of  the 
item's  a.,  b. ,  and  e.  parameters.  Since  the  a.  and  c.  values  are  fixed  in  each 
test,  the  full  capacity  of  these  scoring  procedures  is  not  being  demonstrated. 
Both  studies  only  show  the  capacity  of  these  procedures  to  effectively  weight 
items  as  a  function  of  their  appropriateness  in  difficulty  relative  to  the 
examinee's  ability,  not  their  capacity  to  weight  items  as  item  discriminatory 
power  and  item  coefficient  of  guessing  vary.  In  spite  of  this  shortcoming,  the 
optimal  scoring  procedures  show  a  significant  increase  in  their  ability  to 
successfully  order  the  ability  of  examinees  relative  to  the  conventional  scoring 
procedure.  The  optimal  scoring  procedures  also  measure  examinees  with  more 
precision  and  less  bias  than  the  conventional  means. 

This  study  also  assumes  that  the  multiple  choice  test  has  five  response 
alternatives.  For  tests  with  only  four  item  choices,  or  where  the  chnnce  of 
successful  guessing  is  greater  that  .15,  the  scoring  properties  would  tend  to 
diminish.  The  optimal  scoring  procedure  properties  would  likely  diminish 
slightly,  while  the  conventional  scoring  properties  would  drop  more 
signif icantlv.  This  is  due  to  the  optimal  weighting  procedures  capacity  to 
diminish  the  effect  of  test  score  error  due  to  guessing,  while  conventional 
scoring  procedures  are  less  capable  of  reducing  the  effects  of  guessing. 
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CONCLUSION 


In  this  Monte  Carlo  study,  three  optimal  test  scoring  procedures  using 
estimated  item  parameters  provided  better  psychometric  properties  of  ability 
estimates  than  the  conventional  procedure.  The  optimal  scoring  procedures 
provided  the  greatest  advantage  over  conventional  scoring  on  rectangular  tests 
composed  of  items  with  high  discriminatory  power.  Test  publishers  should 
seriously  consider  using  one  of  these  optimal  methods  to  score  their  multiple 
choice  examinations.  With  the  ready  availability  of  computer  programs  to 
estimate  item  parameters  and  optimally  score  tests,  the  benefits  of  enhanced 
measurement  of  ability  should  outweigh  the  slight  increase  in  computer  costs. 
This  study  also  showed  that  the  fidelity  coefficient  criterion  is  only  a  group 
psychometric  indicator,  and  does  not  show  the  capacity  of  the  test  to  measure 
low  and  high  ability  examinees.  This  criterion  should  be  used  with  caution  to 
avoid  making  erroneous  conclusions. 
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AIR  FORCE  CIVILIAN  PROMOTION  APPRAISAL  SYSTEM  DEVELOPMENT 


R.  Bruce  Gould,  PhD 
Chief,  Performance  Evaluation  Section 
Air  Force  Human  Resources  Laboratory 


A  system  for  ranking  civilian  promotion  eligibles  was 
developed  to  meet  legal  and  operational  requirements. 
Demographic  data  and  supervisor  and  peer  ratings  were 
obtained  on  a  20,000  case  target  sample  to  establish  a 
pool  of  potential  ranking  elements.  Variable  reduction 
procedures  eliminated  those  elements  which  were 
redundant,  showed  greatest  potential  for  adverse  impact, 
or  were  less  defensible  legally  or  psychometrical ly.  The 
remaining  variables  were  submitted  to  12-member  promotion 
data  panels  for  each  of  23  homogeneous  job  family 
clusters.  Policy  capturing  techniques  resulted  in 
regression  weighted  promotion  algorithms  which  use 
between  5  and  18  behavioral  dimensions,  depending  on  the 
job  family  and  supervisory  or  nonsupervisory 
classification  of  the  job,  to  rank  promotion  candidates. 
Correlations  between  promotion  panel  rankings  and 
algorithm  predicted  rankings  ranged  from  „:.89  to  ^'.96 
indicating  consistent  panel  member  policies  and  high 
interrater  agreement.  .  Separate  reliability  panels 
convened  for  two  job  families  resulted  in  correlations  of 
.94  to  .98  between  their  rankings  and  those  of  the 
algorithm  development  panels.^,  A  single  supervisory 
rating  form  and  40  job-type  specifTc-algprithms  have  been 
recommended  for  operational  implementation. 

The  Air  Force  Human  Resources  Laboratory  began  development  of  the 
Civilian  Promotion  Appraisal  System  (CPAS)  in  July  1977  at  the  request 
of  the  Directorate  of  Civilian  Personnel.  The  system  was  to  provide  a 
rating  process  for  determining  promotion  potential  and  rank  ordering 
all  General  Schedule  (GS)  ano  Federal  Wage  Scale  promotion  eligibles. 
The  rating  system  would  be  the  last  ranking  stage  in  the  Air  Force 
Promotions  Placement  and  Referral  System  (PPRS).  The  earlier  stages 
screen  candidates  on  basic  eligibility,  e.g.,  satisfying  minimum 
skills,  knowledge,  and  experience  requirements  established  by  a  job 
analysis  of  each  specific  vacant  position.  CPAS  was  to  restore 
credibility  to  the  civilian  promotion  appraisal  process,  be 
operationally  efficient,  and  manage  rating  inflation. 

The  research  approach  was  designed  to  meet  constraints  set  by  the  user 
and  was  subsequently  found  to  meet  later  constraints  imposed  on 
promotion  systems  by  legal  and  higher  authority  directives.  In  1978, 
the  Office  of  Personnel  Management  (0PM)  directed  that  promotion 
systems  only  use  those  aspects  of  current  job  performance  ratings 
which  are  related  to  the  specific  new  job  in  promotion  actions. 
Public  Law  95-454,  otherwise  known  as  the  Civil  Service  Reform  Act  of 


1978,  stated  that  task- level  performance  measures  must  be  a  basis  for 
promotion  actions.  The  Equal  Employment  Opportunity  (EEO)  ana  Uniform 
Guidelines  for  Employee  Selection  (1978)  dealt  with  adverse  impact 
issues  and  specified  selection  system  developmental  actions  which  must 
be  taken  to  minimize  adverse  impact  while  maximizing  validity.  These 
constraints  plus  a  review  of  case  law  (Cascio  &  Bernardin,  1981) 
mandate  that  the  new  promotion  system  must  not  rely  on  subjective 
measures  of  workers’  affective  states,  e.g.,  traits  or  character¬ 
istics,  or  must  not  use  overall  job  performance  measures  since  they 
contain  performance  elements  not  demonstrated  to  be  associated  with 
the  new  job.  Likewise,  the  performance  potential  measures  coulo  not 
ask  raters  to  project  performances  of  job  aspects  which  were  bevond 
those  observable  in  the  current  jobs. 

Legal  defensibi  1  ity  of  tne  promotion  system  was  a  major  goal  of  the 
project.  Under  the  Uniform  Guidelines  (1978),  any  system  which 
results  in  adverse  impact  must  have  fulfilled  several  developmental 
requirements  to  absolve  the  user  from  liability.  The  probability  that 
any  major  selection  system  will  have  adverse  impact  is  sufficiently 
high  that  all  such  systems  should  be  developed  adhering  to  the 
guidelines.  To  satisfy  the  guidelines,  all  possible  selection 

components  must  have  been  considered  and  systematically  selectea  to 
provide  a  system  which  maximizes  objectivity  and  validity  while 
keeping  to  a  minimum  elements  which  will  contribute  to  adverse 
impact.  Elements  showing  the  least  amount  of  bias  must  be  selected 
but  not  to  the  point  of  sacrificing  validity.  Where  criterion 

validity  is  not  available,  construct  and  content  valiaity  should  be 

used  to  develop  the  system  ana  criterion  validation  should  become  an 

ongoing  process  during  system  use. 

This  report  outlines  the  procedures  used  to  aevelop  the  experimental 
data  collection  devices,  organize  the  civilian  jobs  into  a  manageable 
number  cf  job  families,  collect  the  data,  select  the  variables  for 
inclusion  in  the  final  promotion  system,  develop  the  operational 
algorithms  or  formulas  for  ranking  candidates,  and  recommends  steps 
for  managing  rating  inflation. 

Data  Collection  Instruments 

Three  experimental  data  collect  ion  instruments  were  developed.  The 
instruments  contained  items  for  research  purposes  as  well  as  potential 
promotion  ranking  elements.  Literature  surveys,  comparisons  of 
existing  Government  and  industry  rating  forms,  job  analyses  conducted 
in  similar  military  occupations,  and  experimenter  subjective  judgments 
provided  the  pool  Of  potential  promotion  ranking  elements  from  which  a 
demographic  questionnaire  a  no  a  rating  form  were  developed.  The  10k 
item  Demographic  Questionnaire  captured  traditional  demographic 
information  such  as  age,  sex,  and  ethnic  background;  promotion, 
education,  and  job  training  nistorp;  time  in  service,  graue,  ana 
on-the-job  measures;  and  interest  in  ana  attitudes  towaro  promotion 
incentives  such  as  educational  activities  the  employee  was  willing  to 
engage  in  to  become  promotion  eligible.  The  lUG  item  Worker 
Characteristic  Rating  Booklet  used  9-point  adjectival  anchored  rating 


scales  to  obtain  ratings  on  a  wide  range  of  operationally  defined 
worker  job  performance  measures  (skills  and  knowledges).  Also 
included  were  ratings  of  overall  job  performance  and  predicted 
performance  in  the  next  higher  job  and  scores  on  paper-and-penci 1 
measures  of  verbal,  quantitative,  mechanical,  electrical,  and  spacial 
aptitude.  These  later  measures  were  included  to  permit  an  evaluation 
of  rater  accuracy.  ^ 

The  third  instrument  was  the  Civilian  Personnel  Examinatiop-'fCPE) 
which  was  designed  to  measure  the  workers1  quantitative^  verbal, 
spacial,  mechanical,  electrical,  and  administrative  apfitudes.  The 
CPE  was  a  revised  version  of  the  Air  Force  Qualifying  Examination, 
Form  J  (Vitola,  Massey,  &  Wilbourn,  1971).  The  CPE  was  not  a 
candidate  component  for  the  promotion  system  but  again  provided  a 
means  for  assessing  rater  accuracy  and  determining  the  reliability  of 
specific  skills  and  knowledge  ratings. 

Job  Family  Specifications 


To  obtain  a  manageable  number  of  logical  and  homogeneous  clusters  of 
job  types,  the  some  1,500  Air  Force  civilian  job  series  were  combined 
into  23  job  families.  Eight  position  classifiers  and  research 
psychologists  were  divided  into  two  independent  panels  and  instructed 
to  arrange  the  1,500  job  series  into  some  20  homogeneous  clusters 
based  on  task  subject  matter  and  job  requirements.  Their  results  were 
then  compared  and  differences  resolved.  There  was  near  perfect 
overlap  between  tneir  clusters.  The  joint  session  resulted  in  23  job 
family  groupings.  There  were  three  professional,  eight  technical, 
four  administrative,  two  clerical,  and  six  trades  and  crafts  job 
families. 

Data  Collection 


A  stratified  random  sample  of  20,000  target  workers  from  the  784  most 
populous  CONUS  bases  were  selected.  The  sample  was  stratified  by 
grade,  series,  sex,  and  ethnic  category  with  800  to  1,000  cases 
representing  each  of  the  23  job  families.  The  sample  contained  a 
greatly  overrepresentative  number  of  minority  members  so  that  cell 
sizes  would  be  adequate  for  bias  and  adverse  impact  analyses.  The 
Demographic  Questionnaire,  with  self-addressed  envelopes  to  maintain 
privacy,  were  sent  to  20,000  target  workers.  Each  of  their  super¬ 
visors  received  a  Worker  Characteristics  Rating  Booklet  as  did  one  of 
each  target  worker's  peers. 

Statements  outlining  the  "Research  Purposes  Only"  use  of  the  ratings 
and  the  nature  and  purpose  of  the  entire  research  project,  and  the 
efforts  of  local  project  monitors  are  credited  with  the  enthusiastic 
support  of  the  participants.  Seventy-one  percent  of  the  materials 
were  returned  with  65%  of  the  target  workers  (N  =  12,865)  having 
complete  matched  and  usable  data  sets.  The  subsTantial  variance  in 
the  behavioral  performance  element  ratings,  constructive  write-in 
comments,  ano  completeness  if  the  data  attest  to  the  support  workers 
and  supervisors  gave  to  the  promotion  system  development  effort. 


The  CPE  was  administered  to  2,000  of  the  20,000  target  workers  to 
obtain  aptitude  measures  for  the  rater  accuracy  studies.  Worker 
characteristics  ratings  were  obtained  from  peers  to  aid  in  these 
analyses  and  provide  criterion  measures.  The  rater  accuracy  results 
go  beyond  the  purpose  of  this  paper  except  to  say  that  supervisors  and 
peers  were  very  effective  in  predicting  performances  on  aptitude  tests 
and  there  were  no  significant  or  practical  differences  between  their 
rating  effectiveness. 

Variable  Selection 


To  reduce  the  210  demographic  and  rating  variables  to  a  workable 
subset,  a  large  data  matrix  was  constructed  for  rating  each  of  the 
elements  on  a  series  of  selection  criteria.  Four  basic  types  of 
criteria  were  used:  (1)  validity,  (2)  uniqueness,  (3)  bias,  ana  (4) 
legal  defensibi 1 ity.  Correlations  of  the  variables  with  peer  ratings 
of  overall  performance  and  projected  performance  at  the  next  higher 
levels  served  as  estimates  of  item  validity.  Comparisons  of  mean 
values  by  race  and  sex  category  were  used  to  evaluate  item  bias. 
Intercorrelations  between  items  were  used  to  identify  items  capturing 
unique  and  redundant  variance.  Thus  objective  measures  of  each  item's 
validity,  uniqueness,  and  bias  were  available.  Subjective  ratings  of 
legal  defensibility  were  made  by  personnel  specialists  and  research 
psychologists  knowledgeable  in  case  law. 

A  panel  of  eight  research  psychologists  and  two  personnel  specialists 
was  convened  to  select  the  workable  subset  of  variables  from  the  data 
matrix.  Their  goal  was  to  select  items  which  were  valid,  minimized 
bias  and  hence  potential  adverse  impact,  contributed  unique  variance, 
and  had  not  been  demonstrated  to  be  legally  indefensible.  First,  all 
items  which  were  not  legally  defensible  were  deleted.  This  limited 
the  pool  tc  observable  behaviors  and  deleted  demographic  variables 
which  are  highly  related  to  past  opportunities  and  associated  with 
past  discriminations.  Then  judgments  were  made  to  select  the  final 
subset  of  24  rating  elements.  There  were  three  overall  ratings  (job 
performance,  supervisory,  managerial);  12  behavioral  ratings  such  as 
responsiveness  to  directions,  and  self-sufficiency;  three  ability 
ratings  (quantitative,  reading,  data  interpretation);  four  motivation 
indicators  (productivity,  initiative,  speed  of  completion,  and  amount 
of  working  time  spent  in  productive  efforts);  and  two  composite 
variables  which  combined  behavioral  or  motivational  measures.  The 
above  categories  are  used  for  summary  purposes  only.  In  the  actual 
rating  elements,  only  operational  definitions  are  used  which  aescribe 
observable  performances  which  are  related  to  say  managerial  behavior, 
rather  than  using  the  words  "managerial  performance."  By  this 
procedure,  individuals  who  are  not  managers  can  still  be  rated  on 
observable  behaviors  which  are  related  to  managerial  performance. 

Promotion  Algorithms 


Panels  of  11  to  12  subject  matter  specialists  (SMSs)  were  convened  for 
each  of  the  23  job  families  to  select  the  final  set  of  factors  to  be 
used  for  rating  and  ranking  promotion  cl igifcles  in  their  job  family. 
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For  each  panel,  stratified  random  sampling  techniques  were  used  to 
balance  representation  by  race  (Black,  Hispanic,  other),  sex,  and 
supervisory/nonsuper-  ory  status.  Selectees  proportionally  repre¬ 
sented  job  series,  reality,  and  major  commands  while  meeting  minimum 
experience  and  availability  requirements. 

Each  panel's  first  task  was  to  review  and  discuss  the  24  prospective 
rating  elements.  Then  to  select  a  set  of  6  to  10  elements  which  were 
observable  on-the-job  in  their  job  family  and  were  related  to  ability 
to  perform  at  the  next  level  nonsupervisory  job.  A  second  set  was 
selected  for  supervisory  jobs.  Data  decks  were  then  produced  which 
contained  only  the  selected  rating  elements.  The  data  pertained  to 
workers  in  the  original  20,000  cases  who  were  members  of  the  specific 
job  family. 

The  panel's  second  task  was  to  sit  as  a  promotion  board  using 
procedures  similar  to  those  used  by  military  boards.  Members  were 
given  decks  of  70  to  80  cases  and  asked  to  rank-order  the  cases 
according  to  their  promotability  or  potential  to  perform  in  the  next 
higher-level  job.  Decks  were  labeled  according  to  specific  grade 
level  and  supervisory/nonsupervisory  nature  of  the  candidate  job. 
Each  member  received  9  to  15  data  decks  to  rank  order.  Wage  grade  job 
families  considered  only  nonsupervisory  jobs  while  the  one  wage 
supervisory  family  considered  only  supervisory  jobs.  This  accounts 
for  the  small  number  of  decks  considered  by  some  job  families. 
Unknown  to  the  participants,  one  duplicate  deck  was  included  so 
inconsistent  raters  could  be  identified.  An  additional  two  panels 
were  convened  to  replicate,  and  cross-val idate,  the  results  of  two  of 
the  job  family  panels. 

Judgment  analysis  techniques  (Christal,  1963)  were  used  to  capture  the 
promotion  policy  of  each  individual  and  the  aggregate  policy  of  each 
panel.  Of  277  SMSs,  only  two  were  removed  from  the  analyses  because 
of  inconsistent  within-rater  policies.  Judgment  analysis  techniques 
use  the  available  data  elements  (6  to  10)  as  predictors  and  each 
case's  rank  order  as  the  criterion  to  obtain  regression  weights  and 
multiple  Rs.  The  sum  of  the  regression  weights  times  the  respective 
ratings  all  added  to  a  regression  constant  yields  each  case's 
predicted  rank.  The  correlation  (R)  between  the  predicted  and  actual 
ranks  indicates  the  efficiency  with  which  a  panel's  policy  was 

captured. 

Multiple  Rs  ranged  from  .89  to  .96  indicating  the  substantial 
predictive  efficiency  with  which  the  regression  weights  captured  each 
panel's  promotion  policy.  These  values  are  particularly  noteworthy 
considering  that  any  inconsistency  within  raters  and  oifferences 
between  panel  raters  introduces  error.  There  were  no  within  job 

family  grade-level  differences  in  rating  policies,  but  there  were 
significant  differences  between  all  supervisory  and  nonsupervisory 
rating  policies.  Forty  algorithms  were  developed,  18  supervisory  and 
22  nonsupervisory.  The  two  cross-validation  panels  resulted  in 

correlations  of  .94  to  .98  between  their  rankings  and  those  of  the 
algorithm  development  panels. 
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Regression  weights  are  difficult  to  interpret  or  explain  to  users 
promotion  candidates,  so  alternative  weighting  systems  w 
investigated.  A  unit  weighting  scheme  coding  raw  score  weights  1 
than  .5  as  "0"  and  all  other  weights  as  "l"  produced  virtua 
identical  results  and  was  adopted  for  operational  use.  Some  of 
rating  elements  received  "0"  weights  in  all  algorithms  and  the  numl 
of  operational  behavioral  dimensions  was  reduced  to  19. 

Operational  Rating  Form 

A  single  universal  operational  rating  form  was  recommended  for  use  f 
all  GS  and  federal  Wage  Scale  employees.  All  19  remaining  behavior 
dimensions  contain  operationally  defined  components  which  can  be  rat 
for  all  employees.  Operationally,  only  those  dimensions  which  a 
unit  weighted  will  be  used  in  a  specific  promotion  ranking.  7 
universal  form  satisfies  the  user's  request  for  simplicity 
eliminating  the  current  multiple  forms  and  removing  the  necessity  f 
multiple  rating  forms  for  individuals  being  considered  for  positio 
outside  their  current  series.  As  will  be  explained,  the  single  fo 
is  a  necessary  requirement  to  permit  inflation  management. 

Inflation  Management 

Four  specific  recommendations  were  made  to  permit  inflation  mariagemei 
of  the  ratings.  These  recommendations  concern  the  rating  seal* 
rating  all  individuals  in  a  unit  at  the  same  time  using  a  universe 
form,  providing  a  separation  between  job  performance  and  promotic 
rating  systems,  and  providing  a  feedback  system.  The  rating  scale  i 
a  9-point  behavioral ly  anchored  scale  where  the  rated  individual  i 
compared  to  the  average  employee.  The  average  employee  is  defined  a 
being  highly  effective  and  motivated  in  the  performance  of  his/he 
job.  Ratings  using  the  top  or  bottom  two  scale  points  must  b 
substantiated  by  citing  specific  performance  events. 

By  rating  all  individuals  in  a  unit  simultaneously,  raters  can  mak 
better  comparisons  among  subordinates.  Each  supervisor’s  ratings  an 
to  be  summarized  on  work  sheets  and  the  summaries  submitted  with  thi 
ratings  to  an  indorsing  official.  Indorsing  officials  can  then  checl 
for  leniency,  halo,  or  response  set  type  rating  errors.  Further,  tin 
summaries  permit  the  indorsers  to  make  comparisons  across  raters  am 
ensure  that  ratings  are  consistent  with  each  organization  unit's 
productivity.  Indorsers  will  be  authorized  to  andate  that  ratings  bi 
consistent  with  actual  performance. 

The  greater  the  number  of  uses  for  a  rating,  the  greater  the  pressure 
on  the  supervisor  to  inflate  the  rating.  For  this  reason  the  CPAS 
ratings  are  distinct  in  time  and  purpose  from  the  job  performance 
ratings  rendered  under  the  General  Manager  Appraisal  System  (GMAS)  and 
Job  Performance  Appraisal  System  (JPAS).  The  GMAS  and  JPAS  purposes 
and  procedures  were  presented  earlier  at  this  conference  (see 
Thompson,  Cowan,  &  Guerrieri,  1981).  The  only  relationships  between 
the  systems  are  that  individuals  must  have  at  least  satisfactory  GMAS 
and  JPAS  ratings  to  be  promotion  eligible  and  ability  to  render 
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appropriate  CPAS  ratings  is  a  component  of  each  supervisor's  GMAS  or 
JPAS  rating. 


Rating  feedback  systems  usually  provide  mean  summary  values  for 
comparisons  with  peers  and  comparing  subordinate  units  rating 
policies.  Anecdotal  information  suggests  that  organizations  with 
lower  mean  ratings  feel  they  may  have  been  unfair  to  their  employees 
and  increase  the  average  of  their  next  ratings.  The  feedback 
information  actually  increases  rather  than  decreases  inflation  thus 
creating  an  inflation  pump.  With  CPAS,  a  feedback  system  is  still 
recommended.  However,  the  recommended  system  provides  only  variances 
in  ratings,  not  mean  values.  Accompanying  guidance  will  point  out 
that  "the  more  variance  the  better."  Lack  of  variance  is  the  result 
of  poor  rating  management  and  inability  to  differentiate  performance 
within  individuals  and  between  individuals.  Inflation  reduces  score 
dispersion  hence  variance.  This  variance  yardstick  is  also  to  be  used 
to  guide  subsequent  performance  ratings  of  supervisors.  The  job 
performance  rating  systems  mandated  by  CSRA-78,  and  as  developed  by 
the  Air  Force,  required  that  all  supervisors  have  a  critical 
supervisory  ability  job  element.  Therefore,  all  supervisors  are  rated 
on  their  ability  to  rate.  Variance  feedback  will  make  supervisors 
accountable  for  their  rating  styles. 

Summary 

CPAS  developers  have  attempted  to  provide  the  Air  Force  with  a 
civilian  promotion  rating  system  which  is  creditable,  valid,  legally 
defensible,  and  manageable.  The  system  will  be  implemented  by 
April  1982  after  users  and  participants  have  had  significant  training 
on  the  system  goals  and  mechanics  to  include  training  raters  how  to 
rate. 
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SUMMARY 

The  Education  Center,  Marine  Corps  Development  and 
Education  Command,  is  experimenting  with  the  instructional 
applications  of  a  videodisc-based  interactive  television 
system.  Videodisc  technology  offers  the  potential  for  a  wide 
range  of  military  training  applications  including  archival 
storage  and  information  retrieval,  spatial  data  management, 
tactical  mapping,  multi-aspect  filming,  and  nonresident 
instruction.  The  Marine  Corps,  under  a  contract  with 
Interactive  Television  Company,  Inc. ,  has  produced  an  initial 
disc  that  presents  a  tactical  map  of  its  combined  arms  training 
area  at  Twentynine  Palms,  California.  This  paper  discusses  the 
instructional  and  technical  aspects  of  videodisc  technology 
involved  in  the  development  of  such  a  program.  The  authors 
describe  the  hardware  and  programming  techniques  used  in  the 
production  and  presentation  of  interactive  instructional 
programs  which  are  appropriate  to  the  learner's  responses  and 
needs.  They  also  review  some  of  the  ways  interactive 
television  already  is  being  used  in  education  and  training  and 
project  future  developments  in  interactive  video  instruction.  \ 

INTRODUCTION 


We  may  be  reaching  the  productivity  limits  of  our  current 
training  technology.  Massive  efforts  to  improve  "stand-up" 
lectures,  printed  training  materials,  and  "hands-on"  laboratory 
experience  have  begun  to  yield  too  few  significant  returns.  We 
need  revolutionary  new  techniques  that  will  break  through  the 
constraints  imposed  by  existing  training  technology  and  allow 
us  to  beat  the  current  tradeoffs  that  must  be  made  among  costs, 
quantity  and  quality.  Videodisc  technology  is  one  of  the  most 
promising  sources  of  these  new  techniques. 

Videodisc  technology  is  creating  opportunities  for  new 
kinds  and  forms  of  training  devices  at  surprisingly  low  cost. 
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At  the  heart  of  this  technology  is  the  capability  to  access 
tens  of  thousands  of  color  images,  including  stereo  sound,  in 
seconds  or  fractions  of  seconds.  When  coupled  with  the 
ubiquitous  microprocessor,  videodisc  technology  offers  new 
opportunities  for  training  applications  as  well  as  new  issues 
for  the  training  specialist. 

OPTICAL  VIDEODISC  TECHNOLOGY 

A  videodisc  is  similar  to  an  audio  record  except  that  each 
side  of  a  12-inch  disc  contains  30  minutes  of  television. 

Since  television  is  the  rapid  presentation  of  pictures  at  a 
rate  of  30  images  per  second,  each  side  of  a  videodisc  contains 
54,000  still  pictures  in  color. 

The  player  for  a  videodisc  is  much  like  a  turntable  except 
that  under  computer  control  the  player  can  rapidly  find  any 
particular  part  of  the  videodisc.  Specifically,  any  one  of  the 
54,000  images,  or  any  part  of  the  30  minutes  of  television  on  a 
videodisc,  can  be  located  typically  in  a  fraction  of  a  second. 

A  videodisc  provides  a  combination  of  moving  and  still  images, 
any  part  of  which  can  be  quickly  located. 

Videodisc  are  made  like  audio  records  using  original 
materials  that  can  be  movies,  videotapes,  pages  of  text,  tables 
of  numbers,  graphs,  charts,  maps,  drawings,  diagrams,  or 
photographs.  From  the  original  materials,  a  master  disc  is 
produced  that  in  turn  is  used  to  "press"  multiple  copies 
speedily  and  at  low  cosi. 

Three  types  of  videodiscs  and  videodisc  players  are  now 
available.  The  players  are  characterized  by  the  market  they 
are  targeted  for:  general  consumer  usage  and 
industrial/educational  applications.  Various  potential 
manufacturers,  such  as  JVC,  IBM,  Xerox,  Zenith,  to  name  a  few, 
are  waiting  in  the  wings,  but  only  six  are  now  marketing 
videodisc  players. 

-  Magnavox,  a  wholly  owned  subsidiary  of  Phillips,  began 
selling  a  consumer  model  player  for  about  $800  in 
December  1978. 

-  Discovision  (DVA ) ,  under  license  by  MCA,  began  general 
sales  of  an  industrial  player  for  about  $3,000  in  June 
1979. 

-  Thompson-CSF  began  sales  of  an  industrial  player  for 
about  $3,500  early  in  1980. 

-  Pioneer  began  sales  of  a  consumer  player  for  about 
$700  late  in  1980. 
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-  RCA  began  sales  of  their  consumer  videodisc  player 
early  in  1981. 

-  Sony  began  limited  sales  of  an  industrial  videodisc 
player  in  mid~1981. 

The  difference  between  consumer  and  industrial  systems  are 
significant.  For  instructional  settings,  the  industrial 
players  incorporate  microprocessor  control  for  fast  random 
access,  pre-programmed  branching,  and  frame  selection. 
Microprocessors  also  have  been  installed  in  the  less  expensive 
consumer  players  to  provide  random  access,  branching,  and  frame 
selection.  However,  the  servo-mechanism  in  these  players 
requires  from  15  to  20  seconds  to  locate  a  frame  in  the  worst 
case.  Maximum  access  time  for  the  industrial  players  is  under 
four  seconds.  The  consumer  players  lend  themselves  to 
instructional  application,  but  the  functional  capabilities  of 
the  industrial  players  may  be  sufficiently  greater  as  to 
justify  their  higher  cost. 

With  the  exception  of  the  RCA  system,  the  videodisc 
players  discussed  above  are  all  optical,  laser-based  systems. 
They  use  a  12-inch  disc  with  a  spiral  track.  The  track  is 
pitted  with  oblong  depressions  or  micropits  about  1  micron  deep 
that  vary  in  accordance  with  the  audio  or  video  information 
they  represent.  During  playback  the  disc  spins  at  1,800 
revolutions  per  minute  while  these  micropits  modulate  a  low 
power  helium-neon  laser  focused  on  the  track  and  thereby 
generate  a  signal  that  is  processed  and  passed  to  a  standard 
video  monitor  (i.e.,  to  the  antenna  terminals  of  a  TV  set). 

One  video  frame  is  stored  on  each  track  and  there  are 
54,000  tracks  per  disc.  Video,  audio,  and  still  photographic 
information  can  all  be  intermingled  on  these  discs.  These 
videodisc  systems  effectively  provide  rapid  access  to  JO 
minutes  of  video  information,  30  minutes  of  analogue  audio 
information,  54,000  still  photographs,  well  in  excess  of  400 
hours  of  digitized  audio  information,  30  minutes  of  motion 
picture  information,  or  various  combinations  of  these  media. 

The  point  to  be  made  is  that  videodiscs  provide  rapid  random 
access  to  a  lot  of  information  which  can  be  inexpensively 
stored  and  replicated. 

APPLICATIONS  OF  VIDEODISC  TECHNOLOGY  TO  TRAINING 

One  of  the  most  exciting  aspect  of  videodisc  technology  is 
that  it  makes  possible  an  entirely  new  set  of  training 
experiences,  ranging  from  new  kinds  of  training  movies  to 
low-cost  simulators. 
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Interactive  Movies 


Interactive  Movies  translata  movie  viewing  into  an  active 
participatory  process.  In  effect,  the  viewer  becomes  the 
director  and  controls  many  features  of  the  movie.  A  sampling 
of  feature  controls  available  to  the  viewer  is  the  following: 

1.  Perspective .  The  movie  can  be  seen  from  different 
directions.  In  effect,  the  viewer  can  "walk  around"  ongoing 
action  in  the  movie  or  view  it  from  above  or  below. 

2.  Detail .  The  viewer  can  "zoom  in"  to  see  selected,  detailed 
aspects  of  the  ongoing  action  or  can  "back  off*  to  gain  more 
perspective  on  the  action  ^nd  simultaneous  activity  elsewhere. 

3.  Level  of  Instruction.  In  some  cases,  the  ongoing  action 
may  be  too  rich  in  detail  or  it  may  include  too  much  irrelevant 
detail.  The  viewer  can  hear  more  or  less  about  the  ongoing 
process  by  so  instructing  the  Interactive  Movie  System. 

4.  Level  of  Abstraction.  In  some  instances  the  viewer  may 
wish  to  see  the  process  being  described  in  an  entirely 
different  form.  For  example,  the  viewer  might  choose  to  see  an 
animated  line  drawing  of  an  engine's  operation  to  get  a  clearer 
understanding  of  what  is  going  on.  In  some  cases,  elements 
being  shown  in  the  line  drawings  may  be  invisible  in  the 
ongoing  action  —  for  instance,  electrons  or  force  fields  can 
be  shown. 

5.  Speed .  Viewers  can,  of  course,  view  the  ongoing  action  at 
a  wide  continuous  range  of  speed  —  including  reverse  action 
and  no  action  (still  frame) . 

6.  Plot.  Viewers  can  change  the  “plot"  to  see  the  results  of 
different  decisions  made  at  selected  times  during  the  movie. 

typical  application  for  Interactive  Movies  would  be  in 
training  (and  aiding)  equipment  technicians.  The  technician 
could  not  only  see  how  a  particular  part  is  located  and 
installed  from  several  points  of  view  (e.g.  top  versus  bottom) 
but  could  interactively  control  how  detailed  a  description  is 
either  seen  or  heard  regarding  that  maintenance  activity. 

Several  Interactive  Movie  videodiscs  have  been  completed 
using  hand-to-hand  combat  (i.e.,  karate)  as  the  subject  area. 
These  disc  let  the  viewer  not  only  control  playing  a  particular 
karate  move  backward  and  forward  at  any  rate,  but  also  include 
multiple  views  and  closeup  views  following  every  move  from  four 
different  positions.  Several  Interactive  Movies  that  focus  on 
equipment  maintenance  tasks  are  in  progress. 


5U 


Surrogate  Travel  (Media  Mapping) 


Surrogate  Travel  forms  a  new  approach  to  local 
familiarization  and  low-cost  trainers.  The  basic  principle  is 
simple.  Up  to  108,000  images,  showing  discontinuous  motion 
along  a  large  number  of  paths  in  an  area,  are  stored  on  a 
videodisc.  Under  microprocessor  control,  the  user  accesses 
different  sections  of  the  disc,  simulating  movement  over  the 
selected  path. 

The  user  sees  with  photographic  realism  the  area  of 
interest.  Unlike  a  travel  movie,  the  user  is  able  to  both 
choose  the  path  and  control  the  speed  of  advance  through  the 
area  using  simple  controls.  The  videodisc  frames  the  viewer 
sees  originate  as  filmed  views  of  what  one  actually  would  see 
in  tl.e  area.  To  allow  coverage  of  very  large  areas,  the  frames 
are  taken  at  periodic  intervals  that  may  range  from  every  foot 
inside  a  building,  to  every  ten  feet  down  a  city  street,  to 
hundreds  of  feet  in  a  large  open  area,  e.g.,  a  harbor. 

The  rate  of  frame  playback,  which  is  the  number  of  times 
each  video  is  displayed  before  the  next  frame  is  shown, 
determines  the  apparent  speed  of  travel.  Free  choice  in  what 
routes  may  be  taken  is  obtained  by  filming  all  possible  paths 
in  the  area  as  well  as  all  possible  turns  through  all 
intersections.  While  it  might  first  appear  that  this  would  be 
a  time  consuming  and  expensive  technology,  it  is  in  fact 
relatively  efficient  because  of  the  design  of  special  equipment 
and  procedures  for  doing  the  filming. 

Demonstrations  of  this  technology  have  been  developed  for 
building  interiors  (MIT,  National  Gallery  of  Art),  a  small  town 
(Aspen,  Colorado)  .  an  industrial  facility  (nuclear  power 
plant) ,  a  weapon  site,  San  Francisco  Harbor  and  the  Delta 
Corridor  area  of  the  Marine  Corps  Air  Ground  Combat  Center 
( MCAGCC ) ,  Twentynine  Palms,  California. 

To  provide  training  in  reading  and  understanding  maps,  the 
photograph-based  Surrogate  Travel,  or  Media  Map,  is  linked  to 
different  sorts  cf  maps  of  the  area.  In  effect,  the  viewer  can 
travel  across  a  map,  can  focus  in  on  it,  getting  greater  and 
greater  detail  from  what  can  be  presented  by  standard  map 
symbology,  and  then  "fall  through"  the  map  to  see 
photographically  what  the  map  depicts.  In  addition,  the  viewer 
can  switch  among  different  types  of  maps  (e.g.,  topographic, 
infrared,  etc.)  to  develop  an  understanding  of  how  different 
map  symbologies  and  representations  interact. 

In  addition  to  ground  level  travel,  including  the  inside 
and  outside  of  buildings,  aerial  flight  experience  can  be 
proauced  and  used  to  simulate  aerial  reconnaissance  or  for 
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flight  ttaining.  Similarly,  other  forms  of  travel  experience, 
such  as  anchorage  piloting  and  low  level  nap  of  the  earth 
flying,  are  also  easily  accommodated. 

Surrogate  travel  can  also  he  used  to  provide  training  on 
r  ..tine  and  emergency  procedures,  physical  plant  ma intenance , 
safety,  security  as  well  as  other  training  requirements  found 
in  ships,  military  and  industrial  facilities.  In  these 
applications  blueprints,  floor  plans,  procedures,  and 
up-to-date  reference  materials  are  linked  with  the  photography 
of  the  site  to  provide  a  powerful  and  easy  to  use  training 
system . 

Electronic  Libraries 

Electronic  libraries,  in  the  form  of  Spatial  Data 
Management  Systems  (SDMS)  provide  students  and  instructors  with 
quick  and  easy  access  to  an  assortment  of  multi-source  and 
multi-media  information.  Users  literally  "fly  over" 
information  and  select  what  they  want  by  simply  pointing. 
Spatiality  is  used  to  group  materials  into  lesson  plans  so  that 
different  information  spaces  represent  course  concept, 
additional  instruction,  and  assessment  procedures. 

Stored  on  a  videodisc  are  tens  of  thousands  of  frames 
consisting  of  photographs,  diagrams,  charts,  texts,  movies, 
spoken  speeches,  music,  graphs,  etc.  The  pages  can  be 
organized,  reassembled,  segmented,  and/or  duplicated  in 
accordance  with  the  user's  need  and  growing  sophistication  with 
the  subject  matter.  The  pages  can  be  annotated,  highlighted, 
drawn-on,  underlined,  etc.,  at  the  user's  convenience  and 
pleasure . 

For  the  instructor,  the  SDMS  provides  ready  access  to  a 
wealth  of  material  which  might  otherwise  be  unaccessible. 
Instructors  can  access  the  SDMS  to  create  their  own  information 
spaces  (i.e.,  courses  or  lectures)  and  subsequently  present 
such  materia. s  to  large  audiences  in  single  locations  via  large 
screen  television  projection  or  to  multiple  locations  through 
cable  distribution  systems. 

Students  can  independently  use  the  SDKS  for  self  paced 
instruction  by  either  working  through  previously  designed 
information  spaces  or  by  browsing  on  their  own.  When  students 
and  instructors  are  in  remote  locations,  offsite  instruction  is 
facilitated  by  linking  two  or  more  SDMS ' s  together  using 
regular  telephone  lines.  In  this  manner,  a  student  or 
instructor  can  literally  fly  the  other  to  a  topic  of  interest, 
sharing  at  geographically  remote  sites  a  large  library  of 
i n  for mat  ion . 


The  same  video  materials  can  be  used  for  hundreds  of 
different  users.  The  only  thing  that  must  be  changed  from  user 
to  user  is  the  magnetic  storage  medium  (usually  a  floppy  disc) 
which  serves  as  the  user's  private  librarian  for  the  videodisc. 

MARINE  CORPS  INTEREST  IN  VIDEODISC  TECHNOLOGY 

Marine  Corps  interest  in  the  potential  of  videodisc 
technology  stems  from  a  visit  by  several  members  of  the  staff 
of  the  Education  Center  of  the  Karine  Corps  Development  and 
Education  Command  (MCDEC )  to  the  Massachusetts  Institute  of 
Technology  (MIT)  in  the  Fall  of  1978.  The  Man-Machine 
Interface  Lab  at  MIT  had  developed  the  prototype  of  the  spatial 
data  management  system  (SDMS)  under  sponsorship  of  the  Defense 
Advanced  Research  Project  Agency  (DARPA) .  Education  Center 
personnel  sensed  the  immense  potential  that  interactive 
videodisc  systems  held  for  improving  military  training  and 
professional  education.  Initial  interest  centered  around  the 
media  mapping  and  SDMS  configurations.  The  Director,  Education 
Center,  subsequently  requested  that  the  Marine  Corps  procure  a 
single  small-screen  SDMS  for  the  Education  Center  to  use  for 
experimental  purposes.  The  system,  which  was  configured  and 
delivered  to  the  Education  Center  by  Interactive  Television 
Company  of  Arlington,  Virginia,  consists  of  the  following 
equipment : 

.  Microprocessor-based  Spatial  Data 

Management  System  (microprocessor,  keyboard, 
joystick  control,  and  CRT  display). 

.  MCA  Model  7820  Optical  Videodisc  Player. 

.  Black  and  White  Video  Printer. 

.  15"  Color  Monitor. 

It  may  be  worthwhile  at  this  point  to  briefly  review  some 
of  the  major  capabilities  of  the  Marine  Corps  interactive 
videodisc  system  configuration  in  order  to  show  how  these 
capabilities  may  be  exploited  in  supporting  and  enhancing 
training  and  education  programs: 

A.  Ease  of  Operation  by  User 

.  No  typing  or  keyboard  is  required 
User  programming  is  not  required 
No  query  language  must  be  learned 
No  training  period  is  required 
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.  Rapid  retrieval 
.  Joystick  controls 

.  Keyboard  or  symbological  data  retrieval 
Portable 

B.  Videodisc 

54,000  pictures/side 

.  Each  picture/frame  has  a  displayable  electronic  I.D. 
number  1-54,000 

.  Rapid  access  to  any  picture/frame  (1/8  sec.  to  3 
seconds) 

.  Long  life,  durable,  portable 
.  Thousands  of  copies  of  a  master  is  possible 
.  Dual-speech/sound  tracks  each  aide. 

C .  Applications 

.  Group  instruction  with  either  multiple  small  screen 
display  and/or  large  screen  display 

.  Personalized  interactive,  self-paced  learning  with 

built-in  testing  and  tutoring,  augmenting  or  replacing 
written  materials  e.g.,  programmed  texts,  etc. 

Dual  language  capability  on  narrations 

.  Nonresident  instruction  via  teleconferencing  and/or 
videodisc-based  replications  of  resident  instruction 

.  Lecture/course  development  -  an  electronic  data  base 
with  visual  aids  which  serves  as  a  repository  for 
instructional  materials 

Multi-Media  -  videodisc,  magnetic  digital  disc,  and 
microfiche  in  combination  using  text,  movies,  slides, 
drawings,  animation,  graphs,  maps,  aerial  imagery, 
speech,  sound,  etc. 

.  Teleconferencing  -  phone  lines  or  hardwire 
.  Management  Information  System 


(MIS) /Archival 


Repository  -  can  be  used  as  a  library  of  archival 
materials,  for  storage  and  retrieval  of  operation 
plans,  orders,  maps,  historical  documents, 
photography,  statistical  data,  important  papers, 
directives,  legal  records,  etc.  either  classified  or 
unclassified. 

.  Media  Mapping  -  user  controls  path,  viewpoint,  rate  of 
travel,  time  of  day,  time  of  year,  weather,  mode  of 
travel,  detail,  and  any  ancillary  information  (so  long 
as  it  has  been  planned  for  and  included  in  the  program) . 
Media  mapping  discs  can  be  used  for  terrain  analysis, 
targeting,  simulation,  intelligence,  anti-terror ist 
planning,  multi-sensor  planning  and  display,  tactical 
situation  map/status  board,  combat  in  built-up  areas, 
etc . 

.  Interactive  Movies  -  self-paced  or  group  instruction, 
lecture  support,  tactics  training,  decision  making, 
equipment  maintenance  and  operator  training, 
simulation,  etc. 

THE  TWENTYNINE  PALMS  VIDEODISC  PROJECT 

As  its  initial  project,  the  Education  Center  undertook  to 
produce  a  media  map  of  the  Delta  Corridor  area  of  the  Marine 
Corps  Air  Ground  Combat  Center  (MCAGCC)  at  Twentynine  Palms, 
California.  The  Marine  Corps  conducts  a  number  of  combined 
arms  exercises  in  the  Delta  Corridor  every  year.  Because  the 
terrain  is  of  critical  concern  to  units  throughout  the  Marine 
Corps,  it  was  concluded  that  the  disc  would  serve  as  a  solid 
test  bed  for  the  Marine  Corps  wide  use  of  SDMS  technology. 
Interactive  Television  Company  was  eventually  awarded  the 
contract  to  film  the  terrain  and  to  produce  and  index  the  media 
map  videodisc.  The  disc  has  been  produced  and  indexed.  It 
will  be  evaluated  during  the  period  January  -  February  1982. 

The  Twentynine  Palms  videodisc  contains  extensive  coverage 
of  the  Delta  Corridor,  the  Expeditionary  Air  Field,  and  the 
Camp  Wilson  area  of  the  Combat  Center.  It  includes  ground 
routes  and  corresponding  aerial  views  from  both  200  and  1000 
feet.  For  planning  and  instructional  purposes  maps  and 
photography  of  major  Delta  Corridor  features  are  included  on 
the  videodisc.  Following  is  a  description  of  the  videodisc's 
con  tents . 

Ground  Routes .  On  the  videodisc  are  four-way  views 
(forward,  reverse  direction,  left  and  right)  of  over  100  miles 
of  ground  travel. 

The  ground  routes  were  filmed  from  a  Marine  Corps  M880 
truck  with  a  30  foot  picture  interval.  The  truck  was  driven  at 


519 


approximately  five  miles  per  hoar  over  terrain  varying  from 
smooth  asphalt  roads  to  very  rugged  open  terrain. 

Because  of  the  rugged  terrain  a  gyro  stabilizer  camera 
mount  was  used  to  provide  side  and  front  view  images  with  a 
constantly  even  horizon.  To  date,  most  viewers  have  not  been 
able  to  distinguish  between  the  stabilized  and  unstabilized 
views  on  any  of  the  terrain  -  paved  or  rough. 

Aerial  Routes.  A  marine  Corps  HU-1  helicopter  was  used  to 
film  corresponding  aerial  views  from  the  ground  routes.  Aerial 
views  at  200  and  1000  feet  were  mapped.  Picture  intervals  of 
50  and  250  feet  respectively  were  used  for  the  aerial  mapping. 

For  the  aerial  photography,  a  forward-looking  camera  was 
mounted  to  one  of  the  helicopter  skids.  A  side-looking  camera 
was  clamped  down  looking  out  from  the  door  of  the  helicopter. 

An  intervalometer  was  used  to  control  the  camera  framing  rates. 

Ground  and  Aerial  Features.  Throughout,  the  Delta  Corridor 
selected  terrain  and  tactical  features  were  identified  and 
filmed.  Terrain  features  included  examples  of  saddles,  crests 
(military  and  topographic),  dry  washes,  passes,  contours  and  so 
on.  Tactical  features  included  panoramic  and  close-up 
photography  of  major  objective  areas,  staging  and  assault 
areas,  major  passes,  a  firing  range,  and  other  areas  used  in 
Delta  Corridor  exercises.  Ground  and  aerial  photography  of 
both  terrain  and  tactical  features  was  included  on  the 
videodisc. 

In  the  Delta  Corridor,  the  helicopter  was  used  to  film 
panoramic  views  as  would  be  seen  by  an  observer  located  on  any 
one  of  the  three  Observation  Posts  overlooking  the  Delta 
Corridor  area.  Observation  Post  views  look  outward  in  a  360 
degree  sweep.  inward  views  are  also  included  so  users  can 
readily  identify  the  observation  posts  and  know  what  is  located 
on  them. 

Ground  and  aerial  photography  of  the  Expeditionary  Air 
Field  and  Cairp  Wilson  area,  which  is  used  as  the  base  camp  for 
units  visiting  the  Combat  Center  to  participate  in  exercises, 
are  also  on  the  videodisc.  Included  are  views  of  the  runway, 
support  facilities,  fuel  bunkers,  mess  and  shower  facilities, 
and  housing  areas. 

Maps .  Several  topographic  maps  of  different  scales  were 
filmed  and  included  on  the  videodisc.  These  maps  can  be  used 
to  "fly  over"  the  mapped  areas  moving  in  and  out  for  greater  or 
less  detail.  Under  microcomputer  control ,  the  maps  can  be 
correlated  to  aerial  and  ground  photography. 


Narrated  Introduction.  A  narrated  introduction  at  the 
start  of  the  videodisc  describes  the  project's  applications  to 
Marine  Corps  problems.  The  introduction  also  summarizes  the 
videodisc's  content. 

It  is  anticipated  that  the  Twentynine  Palms  Videodisc 
Media  Map  will  serve  multiple  training  functions  within  the 
Marine  Corps.  For  the  Marine  officer  planning  or  engaged  in 
exercises  within  the  Delta  Corridor,  the  Videodisc  Map  will 
serve  as  an  aid  to  planning  and  terrain  familiarization.  For 
Marine  Corps  elements  (such  as  the  Education  Center  at 
Quantico,  Virginia)  engaged  in  teaching  land  navigation, 
terrain  analysis,  and  map  reading,  the  Videodisc  Map  will 
provide  the  instructor  with  a  valuable  training  aid  that 
integrates  ground,  aerial,  and  map  materials  for  instruction. 

FUTURE  MARINE  CORPS  PROJECTS 

The  Marine  Corps  could  select  any  one  of  a  number  of 
useful  videodisc  applications  for  its  next  project.  The 
following  is  a  partial  list  of  possible  applications: 

.  Surrogate  travel  through  a  Landing  Helicopter  Assault 
ship  (IUA)  to  aid  in  the  pre-embarkation  orientation  of 
selected  spaces  and  operational  features  of  interest  to  the 
Landing  Force.  Similar  discs  could  be  developed  for  other 
appropriate  types  of  amphibious  ships. 

.  Interactive  movie  photography,  incorporating  surrogate 
travel  techniques,  which  would  enable  students  to  become 
familiar  with  high-cost,  low-density  equipment  that  they 
ordinarily  would  not  get  an  opportunity  to  work  with  in  the 
classroom . 

.  Interactive  videodisc-based  similations  of  selected  USMC 
command  and  control  (shore  and  seabased)  facilities. 

.  An  electronic  library  of  resource  information  collected 
for  threat  analysis  purposes.  The  disc-based  information  could 
supplement  or  replace  existing  hard  copy  media  for 
instructional  development,  self-  paced  student  research, 
reference  or  instruction. 

.  An  electronic  library  of  information,  including  media  map 
material,  of  terrain,  lines  of  communications,  port  facilities, 
beaches,  landing  zones,  airfields,  and  cities  to  be  combined 
with  other  maps,  photography  and  target  data  required  to 
support  amphibious  operation  planners,  instructors,  and 
students  involved  in  war  gaming  exercises. 
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.  An  electronic  library  of  Marine  Corps  archival  material 
relating  to  the  evolution  of  amphibious  operations.  This  would 
serve  a  two-fold  purpose  in  that  it  would:  1.  ensure  against 
the  loss  of  critical  information  by  providing  working  copies  of 
historical  documents;  2.  establish  a  unique  cross  reference 
and  information  retrieval  system  for  researchers  and  amphibious 
planne  rs . 

It  should  be  recognized  that  the  videodisc  is  a  storage 
medium,  not  an  instructional  system.  But  it  is  a  storage 
medium  of  such  remarkable  capacity  and  with  such  varied 
capabilities  that,  if  used  with  imagination  in  support  of 
carefully  defined  objectives,  it  can  elevate  existing 
instructional  systems  to  new  levels  of  excellence.  As  the 
Marine  Corps  continues  to  define  its  training  objectives,  it 
undoubtedly  will  identify  new  applications  for  the  videodisc 
and  related  forms  of  telecommunication  technology.  In  the 
meantime,  it  will  continue  to  monitor  the  exploratory  efforts 
of  the  Army,  the  Air  Force,  and  other  agencies  in  and  out  of 
government  that  are  developing  new  uses  for  this  power tul 
medium . 


522 


AD  POO  1 326 


TESTING  DURING  TRAINING: 

WHY  DOES  IT  ENHANCE  MOVEMENT  RETENTION 


Joseph  D.  Hagman 

US  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


'  Linear  movement  retention  was  examined  for  training  methods  emphasizing 
(repeating)  either  presentation  (p)  or  test  (t)  trials.  P-trials  were 
experimenter-defined  study  movements  constrained  by  a  mechanical  stop; 
t-trials  were  learner- defined  recall  movements  unconstrained  by  the 
stop.  Separate  groups  of  governmental  employees  received  training 
consisting  of  three,  6-trial  cycles.  Cycles  began  with  a  p-trial  that 
defined  the  criterion  movement  to  be  remembered.  The  five  remaining 
trials  of  each  cycle  varied  in  type  across  groups.  One  group,  for 
example,  performed  successive  t-trials,  whereas  another  performed  suc¬ 
cessive  p-trials  yoked  in  value  to  the  first  group's  t-trials.  Reten¬ 
tion  was  then  examined  at  3  minutes  and  24  hours  after  training. 
Absolute  (unsigned)  error  revealed  that  t-trials  were  more  effective 
than  yoked  p-trials  in  promoting  movement  retention.  The  data  were 
consistent  with  the  hypothesis  that  retention  benefits  obtained  from 
testing  during  training  result  from  better  Initial  learning  (encoding) 
of  kinesthetic  cues  generated  under  a  learner-defined  than  under  an 
experimenter -defined  movement  execution  mode.  It  was  concluded  that 
testing  cannot  only  be  used  to  evaluate  but  also  to  improve  motor  skill 
retention.  ^ 
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TESTING  DURING  TRAINING: 

WHY  DOES  IT  ENHANCE  MOVEMENT  RETENTION 

JOSEPH  D.  HAGMAN 

US  ARMY  RESEARCH  INSTITUTE  FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 

ALEXANDRIA,  VA  22333 


The  Army's  primary  peacetime  mission  is  to  maintain  combat  readiness 
(Guthrie,  1979).  To  be  combat  ready,  soldiers  must  first  become  profi¬ 
cient  in  their  performance  of  job  tasks,  and  then,  retain  this  proficiency 
over  what  can  be  prolonged  periods  of  no  practice.  One  way  to  enable 
soldiers  both  to  reach  and  maintain  combat  readiness  is  through  the  use 
of  task  training  methods  that  promote  effective  acquisition  and  retention. 

To  do  this,  these  methods  must  be  identified  and  compared. 

A  review  of  the  training  research  literature  reveals  that  training 
methods  have  been  compared  primarily  within  the  context  of  laboratory 
experiments.  Here,  training  has  involved  the  execution  of  presentation 
(p)  trials,  where  to-be-learned  information  is  presented  by  the  experi¬ 
menter  to  the  learner  for  study,  and  test  (t)  trials,  where  this  infor¬ 
mation  is  removed  and  the  learner  attempts  to  recall  (reproduce)  it  from 
memory.  Although  standard  training  methods  involve  alternation  of  p-  and 
t-trials  (e.g.,  Tulving,  1967;  Wrisberg  &  Schmidt,  1975),  the  most 
effective  number  and  sequential  arrangement  of  p-and  t-trials  to  use  Is 
a  matter  of  debate.  From  a  traditional  learning  theory  viewpoint,  where 
p-trials  are  seen  as  having  an  effect  similar  to  reinforcement  (Adams  & 
Dijkstra,  1966),  training  methods  that  emphasize  (repeat)  p-trials 
should  be  more  effective  than  those  that  repeat  t-trials.  P-trial 
repetition  increases  the  number  of  reinforcement  opportunities  during 
training,  and  therefore,  should  enhance  both  acquisition  and  retention. 

From  a  contemporary  cognitive  learning  viewpoint,  on  the  other  hand, 
information  processing  activities  such  as  memory  retrieval  and  internal 
item  generation  are  considered  important  aspects  of  acquisition  and 
retention  (Bjork,  1975).  Because  t-trials  provide  an  opportunity  to 
perform  these  activities  on  information  studied  during  p-trials,  training 
methods  that  repeat  t-trials  should  also  be  effective. 

P-trial  effects  have  been  documented  in  numerous  experiments  showing 
that  improved  performance  occurs  when  p-trials  are  repeated  during 
training  (e.g.,  Adams  &  Dijkstra,  1966).  Only  recently,  however,  have 
improvements  associated  with  t-trial  repetition  been  reported.  Research¬ 
ers  have  shown  that  with  verbal  tasks  t-trials  not  only  contribute  to 
acquisition  (e.g.,  Lachmar.  &  Laughery,  1968)  but  also  to  retention 
(Hogan  &  Kintsch,  1971;  Wenger,  Thompson,  &  Bartling,  1980).  EvPn  more 
recently,  t-trials  have  been  reported  to  influence  motor  task  perform¬ 
ance.  Hagman  (1980a, b),  for  example,  had  persons  learn  either  the 
distance  (extent)  or  end-location  (terminal  position)  of  linear  positioning 
movements  under  ttaining  methods  emphasizing  either  p—  or  t-trial  repetition. 
P-trials  were  movements  terminated  by  a  mechanical  stop  tnat  was  prepositioned 
by  the  experimenter  to  ensure  execution  of  the  to-be-learned  criterion 
movement  cue  (i.e.,  distance  or  end-location).  T-trials  were  movements 
performed  with  the  stop  removed.  Tt  was  during  t-trials  that  learners 
stopped  their  own  movement  when  they  thought  they  had  accurately  recalled 
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the  criterion  movement  cue.  Results  of  both  experiments  showed  that 
movement  cue  acquisition  was  better  when  p-trials  were  repeated  during 
training,  whereas  long-term  retention  was  better  when  t-trials  were 
repeated  during  training. 

The  purpose  of  the  present  experiment  was  to  extend  these  earlier 
findings  by  testing  two  hypotheses  suggested  (Hagmar. ,  1980)  to  account 
for  the  beneficial  effect  of  t-trials  on  movement  cue  retention.  The 
first  hypothesis  relies  on  the  procedural  distinction  between  experi¬ 
menter-defined  (i.e.,  performed  with  the  stop  present)  and  learner- 
defined  (i.e.,  performed  with  the  stop  absent)  movements.  Evidence 
suggests  that  movement  cues  generated  under  a  learner-defined  execution 
mode  are  retained  better  than  those  generated  under  an  experimenter- 
defined  execution  mode  (Kelso,  1977;  Stelmach,  Kelso  &  McCullagh,  1976). 
This  enhanced  retention  is  caused  by  superior  learning  (encoding)  of 
learner-defined  movement  cues  brought  about  by  the  learner's  ability  to 
predict  or  anticipate  cue  values  prior  to  movement  initiation  (e.g. , 

Kelso,  1977).  T-trials  allow  for  prediction  because  they  are  learner- 
defined,  whereas  p-trials  do  not  allow  for  prediction  because  they  are 
experimenter-defined.  In  a  multitrial  training  context  learners  base 
posttraining  recall  attempts  on  their  retention  of  cues  generated  during 
the  trial  type  repeated  during  training.  That  is,  learners  rely  on  p- 
trial  retention  when  p-trials  are  repeated,  whereas  they  rely  on  t-trial 
retention  when  t-trials  are  repeated.  Because  t-trials  are  learner- 
defined,  retention  of  t-trial  generated  cues  should  be  superior  to 
retention  of  p-trial  generated  cues  which  are  experimenter-defined. 

Thus,  enhanced  long-term  motor  retention  should  occur  with  training 
methods  that  emphasize  learner-defined  t-trial  repetition. 

The  second  hypothesis  proposed  to  account  for  the  beneficial  effect 
of  t-trial  repetition  on  movement  cue  retention  involves  the  notions  of 
movement  variability  and  motor  schema.  The  motor  schema  is  an  abstrac¬ 
tion  of  task  and  environmental  characteristics  that  develops  through 
repeated  and  varied  movement  during  training  (Schmidt,  1975),  and  serves 
as  a  rule  or  concept  for  movement  generation.  Researchers  have  found 
that  as  variability  increases  during  training  the  abstracted  schema 
information  becomes  increasingly  resistant  to  forgetting  (Newell  & 

Shapiro,  1976;  Posner  (%  Reele,  1970).  In  the  previous  experiments  by 
Hagman  (1980a, b) ,  variability  during  training  was  generated  at  t-trlals 
because  learners  were  inconsistent  in  their  recall  attempts.  In  contrast, 
no  variability  was  generated  by  p-trials  because  all  were  identical  in 
terms  of  distance  (Hagman,  1980a)  or  end-location  (Hagman,  1980).  As  a 
result,  it  could  be  argued  that  schema  strength  was  greater  after 
repeated  t-trial  training  than  after  repeated  p-trial  training.  Thus, 
one  would  predict  better  retention  under  the  former  than  under  the 
latter  training  method. 

The  general  approach  used  in  the  present  experiment  to  test  the 
validity  of  these  two  hypo  theses  involved  yoking  separate  p-trial  training 
method  groups  to  both  the  t-trial  distance  and  t-trial  end-location 
groups  trained  earlier.  Yokin’  involved  using  a  mechanical  stop  to 
ensure  that  p-trials  of  the  yoked  groups  were  identical  to  the  t-trials 
of  the  other  groups  in  terms  of  both  distance  and  end-location.  Thus, 
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yoking  afforded  the  means  of  equating  p-  and  t-trials  in  terms  of  variability 
during  training  but  allowed  the  distinction  to  remain  between  p-  and  t- 
trial  execution  mode  (i.e.,  experimenter-  versus  learner-defined).  If 
variability  per  se  during  training  is  the  key  to  enhanced  retention  of 
movement  cues,  then  one  would  expect  the  retention  displayed  by  the  two 
yoked  p-trial  groups  not  to  differ  from  that  displayed  by  the  two  t- 
trial  groups.  If,  on  the  other  hand,  movement  execution  mode  during 
training  is  the  key  to  enhanced  retention,  then  one  would  expect  the  two 
t-trial  groups  to  display  retention  superior  to  that  of  the  two  yoked  p- 
trial  groups. 

Method 

Subjects 

Sixty  governmental  employees  volunteered  to  serve  as  participants  in 
the  experiment.  All  were  members  of  the  professional  and  clerical  staff 
of  the  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 

Apparatus 

Participants  were  required  to  make  movements  from  left  to  right 
using  a  metal  slide  that  ran  along  a  linear  track  consisting  of  two 
stainless  steel  rods  35  inches  (88.9  cm)  in  length.  Two  Thompson  Ball 
Bushings  supported  the  slide  on  the  rods  which  were  mounted  in  parallel 
on  a  metal  frame  4.25  inches  (11  cm)  apart  and  11  inches  (27.94  cm) 
above  the  frame  base.  The  base  rested  on  a  standard  table  top  31  inches 
(78.74  cm)  from  the  floor.  A  second  slide  was  used  by  the  experimenter 
to  stop  movement  of  the  first  slide  along  the  track.  A  pointer  attached 
to  the  experimenter's  side  of  each  slide  ran  along  a  meter  stick  to 
indicate  respective  slide  position.  Additional  apparatus  included  a  chin 
rest  to  stabilize  head  position,  earphones  through  which  tape-recorded 
procedural  commands  were  delivered,  and  a  blindfold  to  eliminate  visual 
cues. 

Design 

The  experiment  contained  an  acquisition  and  a  retention  segment  as 
shown  in  Figure  1.  The  acquisition  segment  consisted  of  18  training 
trials  divided  into  three  cycles  of  six  trials  each.  Cycles  contained 
p-  and  t-trials.  P- trials  were  experimenter-defined  movements  terminated 
by  the  mechanical  stop.  The  stop  was  prepositioned  by  the  experimenter 
to  ensure  that  participants  executed  (studied)  the  criterion  distance 
end-location  at  p-trials  and  duplicated  t-trials  at  yoked  p-trlals, 
i.e,,  py.  T-trials  were  learner-defined  recall  movements  unconstrained 
by  the  mechanical  stop.  Four  training  method  groups  were  included  in 
the  experiment,  i.e.,  DISTANCE  PRESENTATION  (DP),  DISTANCE  TEST  (DT) , 
END-LOCATION  PRESENTATION  (LP) ,  and  END-LOCATION  TEST  (LT) .  Training 
methods  differed  in  their  emphasis  on  p-  and  t-trials  performed  during 
each  cycle.  Group  DT  performed  cycles  containing  an  Initial  to-be- 
learned  criterion  p-trial  followed  by  five  successive  recall  t-trials. 

Group  DP  performed  cycles  containing  six  successive  p-trials.  The  first 
was  the  criterion,  but  the  next  five  were  yoked  in  distance  to  the 
corresponding  t-trials  of  Group  DT.  Yoking  was  also  applied  to  the  two 
end-location  groups  in  a  similar  fashion.  Because  of  this  yoking 
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Figure  1,  Trial  sequence  for  training  method  groups  at  acquisition  and  retention 


procedure.  Groups  DT  and  LT  were  trained  before  Groups  DP  and  IP,  Data 
from  the  two  yoked  PRESENTATION  groups  were  collected  in  the  present 
experiment ,  whereas  data  from  the  two  TEST  groups  were  collected  earlier 
(Hagman,  1980a, b).  Although  trained  at  different  times,  subjects  in  the 
two  yoked  groups  were  drawn  from  the  same  population  as  those  in  the  two 
TEST  groups. 

The  retention  segment  of  the  experiment  consisted  of  a  single  t- 
trial  performed  by  each  group  at  both  3  minutes  and  24  hours  after 
acquisition,  as  shown  in  Figure  1.  Separate  2x2  mixed  factorial  designs 
were  used  to  examine  distance  and  end-location  cue  retention.  The 
be tween-subjects  factor  was  group  (DP,  DT,  or  LP,  LT)  and  the  within- 
subjects  factor  was  retention  interval  (3  minutes,  24  hours).  Fifteen 
participants  were,  assigned  to  each  of  the  four  training  method  groups 
with  the  constraint  that  each  group  contain  the  same  proportion  of  men 
and  women. 

Procedure 

Participants  were  instructed  to  learn  and  remember  aither  movement 
distance  or  end-location  depending  on  their  group.  Those  in  groups  DP 
and  IP  were  also  told  of  the  yoking  procedure.  All  participants  were 
then  shown  a  written  copy  of  the  trial  command  sequence  that  they  would 
be  hearing  and  told  the  meaning  of  each  command.  The  p-trial  corns  and 
was  "Movement"  and  the  t- trial  command  was  "Recall  Movement."  Each  of 
these  commands  was  preceded  by  "Ready"  and  followed  by  "Rest."  At 
"Ready"  the  experimenter  grasped  the  participant's  hand  and  placed  it  on 
the  handle  of  the  slide.  Five  seconds  later,  the  participant  heard 
either  "Movement"  or  "Recall  Movement"  depending  on  the  trial  type.  At 
"Movement,"  participants  moved  the  slide  across  the  track  until  contacting 
the  mechanical  stop.  At  "Recall  Movement,"  those  in  Groups  DT  and  LT 
moved  the  slide  across  until  they  felt  that  they  had  recalled  the  criterion 
distance  or  end-location,  whereas  those  in  Groups  DP  and  LP  moved  the 
slide  along  until  contacting  a  stop.  This  stop  was  prepositioned  by  the 
experimenter  at  the  distance  or  end-location  recalled  by  participants  in 
Groups  DT  and  LT  at  t-trial  execution.  Five  seconds  were  allowed  for 
movement  execution.  During  this  interval,  participants  received  white 
noise  through  earphones  to  eliminate  auditory  cues  resulting  from 
displacement  of  the  slide.  "Rest"  marked  the  beginning  of  a  10-second 
interval  during  which  participants  removed  their  hand  from  the  slide 
and  placed  it  on  the  table  in  a  predetermined  resting  position.  During 
rest  periods  the  experimenter  recorded  recall  accuracy  to  the  nearest 
millimeter  (when  appropriate)  and  repositioned  the  stop  in  preparation 
for  the  next  trial.  After  "Rest,"  participants  heard  "Ready"  and  the 
command  sequence  for  the  next  trial  began.  During  the  retention  segment 
of  the  experiment,  intervals  of  3  minutes  and  24  hours  were  inserted 
between  "Rest"  and  "Ready."  In  general,  participants  were  instructed 
not  to  count  during  movements  and  shown  the  approximate  movement  speed 
(i.e.,  125  mm/sec)  desired  by  the  experimenter.  Prior  to  making  the 
first  movement,  participants  donned  their  blindfold  and  earphones,  and 
then  were  given  a  10-second  opportunity  to  move  the  slide  and  get  a 
feel  for  its  movement  characteristics. 
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Results 


Algebraic  (signed)  and  absolute  (unsigned)  err nr  scores  were  recorded 
for  each  t-trijl  performed  during  the  retention  s-w^nt  of  the  experiment. 

No  acquisition  data  were  analyzed  because  yoking  prevented  any  differences 
in  group  performances.  Each  performance  measure  was  analyzed  separately. 

Retention  was  examined  using  a  2x2  mixed  factorial  Group  (DP,  DT  or 
LP ,  LT)  by  Retention  Interval  (3  minutes,  24  hours)  analysis  oi  variance 
(ANOVA).  Separate  ANOVAs  were  performed  on  the  algebraic  and  absolute 
error  scores  for  the  two  distance  groups  (DP,  DT)  and  the  two  end- 
location  groups  (LP,  LT) .  No  significant  (P  (.05)  effects  of  interest 
were  found  for  algebraic  error,  therefore  only  absolute  error  scores  are 
reported. 

Distance.  Mean  absolute  error  scores  are  shown  in  Figure  2.  The 
scores  for  the  two  distance  groups  (i.e.,  DP,  DT)  are  on  the  left  and 
those  for  the  two  end-location  groups  (i.e.,  LP,  LT)  are  on  the  right. 

The  absolute  error  ANOVA  revealed  no  significant  main  effects  for  a 
significant  groups  x  retention  interval  interaction,  F(l, 28)=6.85.  The 
rejection  region  for  this  and  all  other  analyses  was  .05.  This  interaction 
resulted  from  an  increase  in  recall  error  over  time  for  Group  DP  and  an 
associated  decrease  in  recall  error  over  time  for  Group  DT.  Individual 
comvar isons  of  simple  main  effects  using  the  least  significant  difference 
method  (Carmer  &  Swanson,  1973)  revealed  that  the  Group  DP  error  increase 
was  revealed  that  3  minutes  after  training  no  difference  in  recall  error 
existed  between  Groups  DP  and  DT,  whereas  24  hours  after  training  Group 
DP  displayed  greater  recall  error  than  that  of  Group  DT. 

End- location.  The  absolute  error  ANOVA,  for  end-location,  revealeo 
a  significant  main  effect  of  group,  F(l,28)=5.85,  demonstrating  greater 
posttraining  recall  error  for  Group  LP  than  for  Group  LT,  and  a  gnup  x 
retention  interval  interaction  that  approached  significance,  F(1 ,28)*3. 11, 
,05<£(.10.  Although  nonsignificant  by  conventional  standards,  further 
analysis  of  simple  main  effects  associated  with  this  interaction  was 
justified  by  a  priori  expectations  of  training  me-hod  outcome  as  indicated 
by  the  results  obtained  for  distance  cue  recall.  As  shown  in  Figure  2, 
the  marginal  interaction  resulted  from  an  increase  in  recall  error  after 
training  for  Group  LP  while  Group  LT  error  remained  almost  unchanged. 
Individual  comparisons  revealed  that  the  Group  LP  increase  was  significant, 
and  that  Group  LT  error  was  statistically  stable.  Group  recall  performance 
did  not  differ  3  minutes  after  training  while  24  hours  after  training 
Group  LP  error  was  significantly  greater  than  Group  LT  error.  Conservatively 
speaking,  the  absolute  error  data  for  both  movement  distance  and  end- 
location  cues  reveal  that  training  methods  that  emphasize  testing  (i.e., 

DT,  LT)  prevent  posttraining  task  retention  decrements,  whereas  those 
that  emphasize  presentation  produce  marked  posttraining  retention  decrements. 
Thus,  even  the  yoking  procedure  used  in  the  present  experiment  to  increase 
movement  variability  during  training  was  unable  to  prevent  forgetting 
when  p-trials  were  emphasized. 


Figure  2.  Mean  absolute  error  on  retention  t-trials  for  distance 
and  end-location  training  nethod  groups. 


Discussion 


The  purpose  of  this  experitPent  was  to  explain  previous  data  showing 
that  repeated  testing  during  training  is  more  effective  than  repeated 
presentation  in  promoting  long-term  motor  task  retention  (Hagman,  1980a, b). 
Two  hypotheses  were  tested.  The  first  stated  that  retention  benefits 
were  caused  by  differences  in  the  learning  (encoding)  characteristics  of 
p-  and  t-trial  due  to  differences  in  movement  execution  mode.  The 
second  hypothesis  stated  that  retention  benefits  were  the  result  of 
increased  movement  variability  produced  by  t-trial  execution  during 
training.  The  present  absolute  error  differences  found  between  Groups 
DP  and  DT  and  between  Groups  LP  and  LT  support  the  execution  mode  hypothesis. 
Although  p-  and  t-trial  variability  was  equated  during  training  through 
yoking,  retention  differences  at  24  hours  after  training  still  favored 
the  t-trial  repetition  groups  for  both  distance  and  end-location  cue 
recall.  Thus,  the  variability  hypothesis  is  not  supported. 


How  does  movement  mode  influence  retention?  As  suggested  earlier 
(Hagman,  1980b) ,  in  multicrial  training  situations  where  either  p-  or  t- 
trials  are  emphasized  through  repetition,  learners  base  later  recall 
attempts  on  their  retention  of  movement  cues  generated  at  repeated 
trials.  It  is  easier  to  remember  t-trial  cues  than  p-trial  cues  because 
the  former  are  learner-defined.  Better  retention  of  learner-defined 
cues  comes  from  the  learner’s  ability  to  predict  or  anticipate  movement 
cues  prior  to  initiation.  According  to  Kelso  (1977),  "when  a  person  is 
able  to  predict  movement,  two  sets  of  signals  are  generated;  (a)  the 
downward  discharge  to  effector  organs,  and  (b)  a  simultaneous  central 
discharge  from  motor  to  sensory  centers  that  presets  sensory  systems  for 
the  anticipated  consequences  of  the  motor  act"  (p.35).  Thus,  the  role 
of  anticipation  or  prediction  is  to  enhance  the  encoding  of  movement 
kinesthetic  information  arising  from  muscles  and  joints  (Kelso,  1977; 
Steltaach ,  et.  al.,  1976).  An  extension  of  this  corollary  discharge 
theory  can  explain  the  superior  retention  resulting  from  t-trial  repetition. 
It  is  argued  that  at  t-trials  cortical  sensory  centers  are  more  prepared 
to  receive  incoming  afferent  impulses  from  muscles  and  joints,  since 
movement  consequences  can  be  anticipated.  At  p-trials,  on  the  other 
hand,  this  would  be  more  difficult  since  lirtle  if  any  prior  information 
is  available  regarding  the  terminal  locus  of  the  movement.  It  is  this 
superior  encoding  of  t-trial  cues  relative  to  p-trial  cues  that  causes 
superior  long-term  retention. 

Finally,  it  should  be  mentioned  that  although  the  present  results 
rule  out  variability  per  se  as  the  cause  of  t-trial  retention  effects, 
thev  do  not  rule  out  the  possibility  that  variability  contributes  to 
retention,  but  does  so  only  when  generated  during  learner-defined  movements. 
It  could  be  argued,  for  example,  that  the  effects  of  variability  are 
dependent  nn  movement  mode,  and  perhaps  vice  versa.  Although  the  present 
experiment  does  not  discount  this  interpretation,  no  data  have  been 
reported  either  to  suggest  or  support  it.  Therefore,  it  remains  highly 
speculative,  yet  worthy  of  future  research. 

Cone lus ions 


The  results  of  this  experiment  help  to  clarify  past  research  findings 


task  retention.  In  doing  so,  they  assist  the  Army  in  its  quest  to 
identify  training  methods  that  produce  the  highest  levels  of  motor  task 
acquisition  and  retention. 

From  the  results  it  can  be  concluded  that:  (a)  Training  methods 
that  provide  for  increased  opportunities  for  testing  improve  long-term 
motor  task  retention;  (b)  these  benefits  derive  from  the  superior  encoding 
of  learner-defined  movements  performed  during  t-trials,  relative  to 
experimenter-defined  movements  performed  during  p-trials;  (c)  increased 
variability  of  movement  caused  by  t-trial  repetition  during  training  is 
not  responsible  for  the  obtained  retention  benefits  associated  with 
testing;  (d)  testing  during  training  benefits  both  movement  distance 
and  end-location  cue  retention. 
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Differential  Effectiveness  and  Efficiency  of  Individualized  Instruc- 
tion:  I.  Background  and  Study  Design  ~ 

The  Navy's  Training  Analysis  and  Evaluation  Group  (TAEG)  conducted 
an  empirical  study  to  determine  if  measures  of  training  effectiveness 
or  training  efficiency  were  differentially  related  to  the  method  of 
instruction  used  to  train  Navy  technical  school  graduates.  Also 
investigated  were  the  interrelationships  between  methods  of  instruc¬ 
tion,  ability  levels  of  graduates  and  the  types  of  tasks  taught  in  the 
technical  schools.  Since  the  same  content  was  not  taught  in  courses 
conducted  under  each  method,  it  was  necessary  to  equate  the  content  of 
the  20  courses  (i.e.,  tasks  taught)  to  a  common  base  so  that  appro- 
priate  comparisons  could  be  made.  For  this  purpose  a  generic  task 
classification  system  was  developed.  School  achievement  and  job 
performance  measures  obtained  on  over  5000  technical  school  graduates 
served  as  the  criterion  variables.  This  paper  presents  the  background 
for  and  design  of  the  study. 
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DIFFERENTIAL  EFFECTIVENESS  AND  EFFICIENCY  OF  INDIVIDUALIZED  INSTRUCTION: 

I.  BACKGROUND  AND  STUDY  DESIGN 

Eugene  R.  Hall  and  don  S.  Freda 

Training  Analysis  and  Evaluation  Group 
Orlando,  Florida  32813 

Individualized  instruction  has  become  a  controversial  issue  in  military 
training.  Many  individuals  in  both  training  and  operational  settings  have 
come  to  believe  that  individualized  instruction  (II)  is  not  a  desirable  or 
effective  way  to  train  students  for  operational  job  assignments.  The  wide¬ 
spread  belief  is  that  conventional  classroom  group-paced  (GP)  methods  result 
in  better  trained  personnel. 

Currently,  the  U.S.  Navy  conducts  technical  training  under  both  II  and  GP 
instructional  methods.  The  principal  II  methods  used  are  computer  managed 
instruction  (CMI)  and  self-paced  (SP)  instruction.  Because  of  the  potential 
for  reduced  student  training  time,  the  Navy  plans  to  individualize  still  more 
of  its  courses.  However,  in  view  of  concerns  expressed  by  fleet  units,  the 
Chief  of  Naval  Education  and  Training  (CNET)  tasked  the  Training  Analysis  and 
Evaluation  Group  (TAEG)  to  conduct  a  study  to  examine  the  effects  of  indivi¬ 
dualized  instruction. 

Purpose 

The  purpose  of  the  study  was  to  determine  if  individualized  instruction 
is  more  or  less  effective  and/or  efficient  than  conventional  instruction,  and 
further,  if  these  effects  differential ly  relate  to: 

.  training  individuals  of  differing  ability  levels  and/or 
training  different  types  of  tasks. 

The  present  paper  presents  details  of  the  methodology  employed  to  conduct 
the  study.  A  subsequent  paper  in  this  volume  (Freda,  Hall  and  Ford)  presents 
and  discusses  study  findings. 

Method 

Since  experimental  methods  were  impractical  to  employ  within  the  context 
of  an  ongoing  military  training  system  a  correlational  approach  was  used  to 
conduct  the  study.  Under  this  approach,  statistical  analyses  are  performed  on 
existing  record  data  to  determine  if  significant  relationships  exist  among 
variables  of  interest.  Because  of  limitations  inherent  in  available  data 
however,  caution  is  required  in  the  interpretation  of  results  obtained.  Also, 
there  are  limits  on  the  generalizations  that  can  be  made.  While  some  results 
may  be  viewed  as  definitive,  others  must  be  viewed  as  only  suggestive. 

Further,  certain  of  the  results  should  be  considered  applicable  to  only  the 
courses  included  in  the  study.  These  situations  will  be  appropriately  noted 
in  the  present  text  and  in  the  subsequent  paper  by  Freda,  Hall,  and  Ford. 


Study  Questions 

The  TAEG  study  was  designed  to  answer  two  major  questions  derived  from 
the  orginal  CNET  tasking: 

1.  Are  significant,  relative  differences  in  training  effectiveness  and/or 
efficiency  associated  with  individualized  instruction  versus  group-paced  (GP) 
instruction? 

2.  If  there  are  differences,  are  they  related  to  different  kinds  of 
training  tasks  and/or  ability  levels  of  trainees? 

Study  Variables 

The  study  examined  seven  major  variables.  The  major  predictor  variables 
were  method  of  instruction,  trainee  ability  level,  and  type  training  task. 

The  major  criterion  variables  were  training  costs,  end  of  course  grades,  time 
to  complete  the  course,  and  Naval  Education  and  Training  Command  (NAVEDTRACOM) 
Training  Appraisal  System  (TAS)  ratings  (see  below).  Training  costs  and  time 
to  complete  were  viewed  as  training  efficiency  measures,  and  end  of  course 
grades  and  TAS  ratings  as  Internal  and  external  measures  of  training  effective¬ 
ness,  respectively. 

Predictor  Variables. 

Method  of  Instruction.  The  primary  predictor  variable  of  the  study  was 
instructional  method.  Two  basic  methods  were  of  interest:  Individualized 
Instruction  (II)  and  Conventional  Instruction  (Cl).  II  is  defined  as  an 
instructional  strategy  in  which  the  learning  activities  are  designed  to 
accommodate  individual  differences  in  background,  skill  level,  aptitudes,  and 
cognitive  styles.  For  the  present  study,  II  involved  self-paced  and  computer- 
managed  courses.  Cl  is  defined  as  an  instructional  strategy  in  which  learning 
activities  are  directed  toward  a  normative  model  of  the  target  population 
characteristics.  Cl  usually  occurs  in  a  group  environment  (i.e.,  is  group- 
paced)  . 

To  develop  lines  of  inquiry  for  the  study,  a  brief  literature  survey  was 
conducted.  Previous  findings  from  reviews  of  II  versus  Cl  used  in  military 
settings  (e.g.,  Orlansky  &  String  1979)  indicate  that: 

.  II  is  as  effective  as  Cl  in  terms  of  end  of  course  achievement  scores 

.  the  efficiency  of  II  is  significantly  better  than  that  of  Cl  in  terms 
of  student  time  to  complete  instruction 

.  training  costs  for  II  are  lower  due  to  the  student  time  savings. 

.  the  addition  of  computer  support  (either  CAI  or  CMI)  to  self-paced 

instruction  does  not  significantly  increase  student  time  savings. 

Orlansky  and  String  also  noted  that  the  previous  research  contrasting  II 
and  Cl  was  confined  to  comparisons  on  measures  of  student  learning  achievement 
available  at  the  schools.  External  criteria  of  training  effectiveness  were 
not  used  for  assessing  the  effects  of  different  instructional  methods. 
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The  relevance  of  previous  research  for  the  TAEG  study  then  was  that: 

.  school  performance  scores  (i.e.,  end  of  course  grades)  should  be 
equivalent  for  Cl  versus  II  graduates  overall 

time  to  complete  training  should  be  significantly  less  for  II  than 
for  Cl 


.  student  time  to  complete  training  should  be  equivalent  for  SP  versus 
CM  I. 

Since  no  previous  data  were  reported  for  method  effects  on  external  criteria 
of  training  effectiveness,  these  analyses  were  considered  wholly  exploratory. 

Ability  Level.  Ability  levels  of  trainees  constituted  the  second  major 
predictor  variable  of  interest.  For  the  present  study,  ability  levels  of  trainees 
were  considered  to  be  reflected  by  student  scores  on  the  Armed  Services  Vocational 
Aptitude  Test  Battery  (ASVAB).  This  test  is  routinely  administered  to  all 
armed  services  enlistees. 

The  ASVAB  consists  of  12  subtests.  Various  combinations,  or  composites, 
of  ASVAB  subtest  scores  are  used  by  the  Navy  to  determine  an  individual's 
eligibility  for  attendance  at  specific  technical  schools.  Scores  from  three 
of  the  battery  subtests  can  also  be  used  to  derive  an  Armed  Forces  Qualifica¬ 
tion  Test  (AFQT)  percentile  equivalent.  These  percentile  equivalent  scores 
provide  a  measure  of  general  ability.  They  can  also  be  used  to  group  indivi¬ 
duals  into  mental  categories.  AFQT  percentile  scores  (derived  from  the  pre- 
October  1980  conversion  routines)  were  used  to  represent  ability  levels  of 
graduates  for  making  comparisons  across  courses. 

Types  of  Training  Tasks.  Definitive  conclusions  concerning  the  value  of 
II  versus  Cl  for  training  different  types  of  tasks  require  that  the  same  tasks 
be  taught  under  each  method.  Unfortunately,  the  same  courses  (and,  consequently, 
the  same  tasks)  are  not  taught  under  each  of  the  two  basic  methods  of  instruction, 
and  it  was  beyond  the  scope  of  the  study  to  create  this  condition  experimentally. 
Thus,  the  matching  of  tasks  under  different  methods  of  instruction  was  approximated 
through  the  use  of  a  generic  task  classification  system.  The  content  of  the 
different  courses  (i.e.,  tasks  taught)  was  equated  to  a  common  base  so  that 
appropriate  comparisons  could  be  made.  The  use  of  these  comparisons  assumes 
that  the  psychological  processes  involved  in  the  acquisition  of  generic  skills 
(e.g.,  decision  making)  are  the  same  regardless  of  the  specific  context  in 
which  the  learning  occurs. 

For  the  present  TAEG  study,  a  modification  of  the  Instructional  Quality 
Inventory  ( I Q I )  method  (Ellis,  Wulfeck  and  Fredericks,  1979)  was  used  to  equate 
skill  and  knowledge  items  taught  in  each  of  the  courses  to  a  common  base.  Subject 
matter  experts  (SME)  at  each  school  classified  statements  of  the  schools  learning 
objectives  into  one  of  five  types  of  information  content:  fact,  category, 
procedure,  rule  and  principle.  The  IQI  definitions  of  content  type  (see  Ellis, 
et  al)  were  retained.  The  five  classifications  were  then  interpreted  as  generic 
tasks  so  that  comparisons  could  be  made  across  the  courses. 
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Criterion  Variables. 


.  Training  Effectiveness  Measures.  Two  sets  of  measures  of  students'  learning 
achievements,  which  indicate  course  training  effectiveness  were  of  interest  to 
the  study: 


.  internal  measures  which  reflected  how  well  students  performed  in 
school 

.  external  measures  which  reflected  graduate  job  performance 

Internal  measures  of  student  learning  achievement  were  obtained  from  the 
schools.  These  consisted  principally  of  end  of  course  grades.  Other  internal 
measures  of  course  data  reflecting  training  effectiveness  were  not  routinely 
available.  External  measures  of  student  learning  achievement  were  obtained 
from  the  NAVEDTRACOM  Training  Appraisal  System  Data  Base.  These  consisted  of 
fleet  supervisor  ratings  of  the  adequacy  of  school  training  for  particular 
tasks  which  graduates  are  expected  to  perform  on  the  job. 

The  CNET  Special  Assistant  for  Training  Appraisal  (CNET  015)  routinely 
collects  feedback  data  via  mailout  questionnaires  from  first-level  fleet 
supervisors  of  recent  (i.e.,  4  to  10  months  after  graduation)  technical  school 
graduates.  Random  samples  of  graduates  are  drawn  from  the  total  pool  of  course 
graduates  during  a  given  time  frame.  Fleet  supervisors  rate  on  a  5-point  scale 
(a  "1"  equals  "unsatisfactory"  and  a  "5"  equals  "much  more  than  satisfactory") 
the  adequacy  of  school  training  for  an  identified  course  graduate.  Training 
adequacy  judgments  are  made  for  a  number  of  specific  tasks  for  which  a  given 
technical  school  provided  training.  The  task  statements  listed  on  a  feedback 
questionnaire  are  currently  prepared  by  technical  training  staff  for  a  given 
course.  The  statements  are  based  on  the  learning  objectives  of  that  course. 

The  TAS  data  base  contained  both  a  listing  of  tasks  taught  at  a  school, 
and  ratings  of  training  adequacy  for  these  tasks.  The  questionnaire  task 
statements  which  reflect  specific  skills  and  knowledges  taught  at  a  school 
were  classified  by  school  SMEs  into  the  generic  task  categories  described  above. 
Thus,  specific  generic  tasks  could  also  be  matched  to  supervisor  training  adequacy 
ratings  on  specific  items  to  permit  comparisons  within  and  between  courses. 

Training  Efficiency  Measures.  Two  measures  were  used  to  reflect  training 
efficiency:  training  costs  and  student  time  (contact  hours  of  instruction)  to 
complete  a  course.  Training  cost  data  were  obtained  from  the  CNET  "Per  Capita 
Cost  Data  Base"  for  each  course  in  the  study.  The  training  cost  data  were 
referenced  to  fiscal  year  1979  which  coincided  with  the  time  period  that 
graduates  of  interest  to  the  study  were  in  training.  From  the  cost  data  base, 
a  "training  cost  to  produce  one  graduate  per  course  session"  was  derived  and 
used  in  subsequent  analyses.  Student  course  completion  times  were  obtained 
for  individualized  courses  from  CM I  files  and  from  class  records  at  the  schools 
for  SP  courses.  Completion  times  for  Cl  courses  were  obtained  from  the  schools 
and  from  CNET  computer  files. 

Table  1  identifies  the  major  predictor  and  criterion  variables  of  the 
study.  The  measures  of  the  variables  are  listed  as  are  the  data  sources.  More 
complete  descriptions  of  the  study  samples  are  presented  below. 
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TABLE  1.  MAJOR  PREDICTOR  AND  CRITERION  VARIABLES 


VARIABLE 

MEASURES 

SOURCE 

PREDICTOR 

Ability  Level 

AFQT,  Composites 

CNET  015 

Method  of  Instruction 

SP,  CMI,  GP 

CNET,  CNTECHTRA 

Type  of  Training  Task 

Five  Generic  Tasks 

IQI  (SMEs) 

CRITERION 

EFFECTIVENESS 

End  of  Course  Grades 

Final  Grades 

School  Records 
(SP+GP), CNTECHTRA 
(CMI) 

Training  Adequacy  Ratings 

1-5  Scale 

TAS  (CNET) 

EFFICIENCY 

Cost  of  Course 

Average  Cost  to 

Produce  1  Graduate 
(per  session) 

CNET  Accounting 
System 

Time  to  Complete 

Contact  Hours 

CNTECHTRA  (CMI): 
School  Records, 
SMEs,  NITRAS 

Study  Samples 


Course  Samples. 

The  TAEG  project  staff  coordinated  with  CNET  and  Chief  of  Naval  Technical 
Training  (CNTECHTRA)  staff  to  select  courses  for  inclusion  in  the  study.  The 
plan  was  to  identify  approximately  10  A-level  courses  that  used  conventional 
(i.e.,  group-paced)  instruction  and  10  others  that  used  individualized 
instruction  (i.e.,  self-paced  or  computer-managed).  Courses  in  each 
instructional  category  were  to  include  the  full  range  of  ability  levels  of 
individuals  who  undergo  Navy  technical  training.  It  was  also  desired  that 
courses  in  each  category  be  roughly  matched  on  general  instructional  content 
(i.e.,  type  training  tasks)  and  on  geographic  location. 

Initial  selection  of  courses  for  inclusion  in  the  methods  of  instruction 
groups  was  made  on  the  basis  of  entries  in  the  Navy  Integrated  Training  Resources 
and  Administrative  System  (NITRAS).  This  system  identifies  courses  that  are 
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considered  to  be  individualized  (SP  or  CMI)  and  those  considered  to  be  taught 
conventionally  (i.e.,  GP).  It  should  be  noted  that  these  instructional 
methods  are  not  purely  applied  to  all  portions  of  the  courses. 

Courses  selected  for  the  study  were  those  classified  as  "A-level.“  "A- 
level"  courses  provide  basic  skill  and  knowledge  training  for  entry  level  Navy 
jobs.  These  courses  were  studied  rather  than  C-level  courses  because: 

.  a  greater  diversity  of  tasks  is  trained  in  the  more  general  A-level 
courses  than  in  the  largely  equipment-specific  C-level  courses 

.  a  wider  range  of  student  abilities  is  involved  in  A-level  taining 

.  proportionately  more  A-level  courses  are  taught  under  individualized 
instruction 

It  was  further  desired  that  courses  be  selected  to  the  extent  possible 
from  those  for  which  TAS  data  were  already  available  or  soon  to  be  available 
(i.e.,  within  approximately  the  next  6  months).  Although  CNET  015  training 
appraisal  schedules  could  be  altered,  this  would  result  in  lengthy  time  delays 
to  obtain  data  on  course  graduates  and  would  disrupt  the  ongoing  work  of  the 
schools  and  CNTECHTRA  staff  codes. 

Twenty-three  A-level  courses  were  nominated  by  CNTECHTRA  for  inclusion  in 
the  study.  Although  availability  of  the  TAS  data  was  the  major  determinant  of 
the  sample  composition,  it  is  believed  that  the  otner  criteria  were  reaonably 
well  met  and  that  the  courses  constitute  a  fair  sample  (approximately  15 
percent)  of  Navy  "A"  schools.  Two  officer  courses  were  also  selected  for 
study.  These  courses,  “Damage  Control  Assistant",  were  ostensibly  the  same 
course  taught  under  a  different  method  of  instruction  at  each  of  two 
locations.  In  addition,  data  were  obtained  from  two  basic  CMI  courses. 
Graduates  of  the  RM  Sea  "A"  School  and  the  RM  Shore  "A"  School  also  attended 
the  RM  Basics  course.  Their  records  were  obtained  from  CNTECHTRA  CMI  files 
and  used  in  analyses  of  interest  to  the  study.  Similarly,  graduates  of  the 
EN,  MM  600  psi  and  MM  1200  psi  schools  attended  the  Propulsion  Engineering 
(PE)  Basics  course  prior  to  entry  into  their  respective  "A"  schools.  Their 
records  were  also  obtained  from  CMI  files. 


Graduate  Samples 

Information  pertaining  to  the  number  of  graduates  per  course  for  whom 
data  were  available  for  selected  variables  is  presented  in  table  2.  Data  were 
obtained  from  a  total  of  7,083  records  of  enlisted  personnel  ard  officers  who 
were  graduated  from  the  schools  between  August  1978  and  April  1980.  This 
total  included  duplicate  records  of  graduates  from  basic  courses  and  the  dual¬ 
phase  HT  courses. 
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TABLE  2.  NO.  OF  GRADUATES  PER  COURSE  FOR  WHOM  DATA  WERE  AVAILABLE  FOR  SELECTED  VARIABLES 
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TABLE  2.  NO.  OF  GRADUATES  PER  COURSE  FOR  WHOM  DATA  WERE  AVAILABLE  FOR  FOUR  SELECTED  VARIABLES  (con‘t) 

Selected  Variables 

Original  Time  End  of  Ratings 
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Course  type  based  on  designation  during  1978-1980. 
Converted  to  CMI  In  FY-81. 

‘initially  based  on  school  records  and/or  SME  reports. 


Data  Collection 


The  names,  graduation  dates,  and  SSNs  of  school  graduates  were  obtained 
from  NAVEQTRACOM  TAS  files.  Visits  were  made  to  the  schools  between  August 
and  November  I960  to  obtain  data  on  course  graduates.  At  the  schools,  data 
were  manually  recorded  from  class  records  and  entered  on  worksheet  forms  for 
subsequent  entry  into  computer  files.  Data  recording  was  accomplished  either 
by  TAEG  project  staff  or  school  SMEs  functioning  under  general  TAEG  supervision. 
Information  recorded,  consisted  principally  of  end  of  course  grades  and  time 
to  complete  training.  Where  available,  other  measures  of  training  effectiveness/- 
efficiency  were  also  recorded.  These  other  measures  included  numbers  of  academic 
remediations  and  setbacks,  and  numbers  of  additional  hours  of  instruction  required 
Cost  data  were  obtained  from  CNET. 

Training  adequacy  ratings  and  questionnaire  task  statements  were  obtained 
from  the  CNET  Training  Appraisal  System  (TAS)  Data  Bank.  The  data  included 
the  fleet  supervisors'  TAS  ratings  for  each  graduate  (i.e.,  1,  2,  3,  4  or  5) 
on  each  skill/knowledge  item  of  the  course  feedback  questionnaire,  and  the 
mean  TAS  rating  (computed  over  all  items  of  the  questionnaire)  for  each  graduate 
of  the  course.  The  twelve  ASVAB  subtest  scores  of  each  graduate  were  also 
obtained  from  the  CNET  TAS  data  base  (or  the  student  master  file  when  necessary). 
During  school  visits,  assigned  SMEs  classified  the  items  on  the  TAS  questionnaires 
into  the  generic  task  categories.  Thei>-  classifications  were  also  entered 
into  the  data  base. 


Data  Analysis 

Analyses  of  training  effectiveness  and  training  efficiency  data  were  conducted 
on  four  different  groupings  of  the  courses  in  tne  study.  These  groupings  are 
shown  in  table  3. 


First  Analysis.  The  first  analysis  was  based  on  data  of  19  single-phase  enlisted 
“A"  schools.  The  purpose  of  this  analysis  was  to  investigate  possible  differences 
between  conventional  and  individualized  instruction  across  courses  as  well  as 
to  assess  interrelationships  with  the  two  other  predictor  variables  (i.e., 
student  ability  level  and  tvpe  training  task). 

Second  Analysis.  The  second  analysis  was  based  on  data  of  two  dual-phase  enlisted 
"A"  schools,  fn  these  schools,  enlisted  personnel  received  a  different  method 
of  instruction  in  each  phase  or  the  "A"  jCl.v^I.  A  major  purpose  of  the  second 
analysis  was  to  investigate  the  possibility  of  transfer  effects  between  methods 
of  instruction  within  the  same  group  of  graduates. 

Thi^d  Analysis.  The  third  analysis  was  based  on  data  of  two  single-phase  officer 
courses.  Both  officer  courses  nominally  presented  similar  subject  matter  but 
each  under  a  different  metnod  of  instruction.  Thus,  the  purpose  of  the  third 
analysis  was  to  investigate  possible  differences  in  methods  of  instruction 
between  the  two  officer  courses  when  trainirg  content  is  held  "constant.'1 


Fourth  Analysis.  The  fourth  analysis  was  based  on  data  obtained  from  all  the 
courses  included  in  the  previous  three  analyses  plus  three  basic,  pre"A"  school 
courses.  The  purpose  of  the  fourth  analysis  was  to  provide  an  in-depth  investiga- 

.. .f  f  i  c  i 


t  i  nn  nr  f  rj  '  n  1  n». 


The  findings  from  the  first  analysis  are  presented  in  the  subsequent  paper 
in  this  volume  by  Freda,  Hall,  and  Ford.  The  findings  of  the  other  analyses 
will  be  contained  in  reports  currently  being  prepared  by  the  TAEG. 
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TABLE  3.  DESCRIPTION  OF  COURSES  INCLUDED  IN  THE  FOUR  ANALYSES 
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RELATIHC  SELECTION  PREDICTIONS  TO  ATTRITION 
IN  THE  BRITISH  tNPANTKT 


L  Heaton,  Any  Peraonnel  Reatareh  Eatabliahment, 
Famborough,  Santa,  UR 


The  current  aelection  ayatea  for  non-officer  rank*  in  deacribed  with 
detail*  of  previoue  validation  work.  Typically  low  non-aignificant 
correlation  coefficient*  ware  obtained  with  Infantry  group*.  Current 
work  in  thia  area  ia  aeverely  haapered  by  the  lack  of  auitable  criteria 
but  atrategiee  for  cverconing  thia  are  under  diacuaaion.  Length  of 
aervice  waa  correlated  with  aelection  reaulta  for  an  Infantry  group  and 
no  correlatiana  could  be  deaonatrated.  However,  data  u*ed  ahow  that 
exit  rate*  are  dependent  on  individual  training  depota.  The  effecta  of 
civilian  unemployment  in  depreaaing  waatage  directly  and  indirectly 
(by  ratting  the  ability  of  terving  aoldiera  aa  a  population)  are  diacutaed. 


RELATING  SELECTION  PREDICTIONS  TO  ATTRITION  IN  THE  BRITISH  INFANTRY 


INTRODUCTION 

1.  The  initial  ala  of  thia  research  vat  to  evaluate  the  validity  of  the 
teata  currently  need  to  aelect  Infantry  aoldiera  in  the  Britieh  Amy. 

However,  doe  to  problem*  aaaociated  with  the  availability  of  auitable 
criteria  for  theae  purpoaea,  the  data  collected,  vat  uaed  for  a  aecond 
purpoae,  that  of  examining  the  vaatage  rate*  from  two  comparable 
Infantry  training  depot*.  Unemployment,  a  key  factor  in  thia  inetance, 
i*  alto  relevant  in  any  diacuaaion  of  the  recruitment  pool  and 
consequently  teet  effectiveness.  Theae  three  topic*,  validity,  attrition 
and  unemployment  are  therefore  the  main  areaa  diacuaaed  in  thia  paper. 

BACKGROUND 

2.  The  Britiah  Army  vaa  establiahcd  a*  an  all-volunteer  force  from  1960. 

In  recent  years,  Teeruitment  terete  have  been  act  around  the  20,000  mark 
(man,  women  and  junior  entry)  though,  due  to  the  current  economic  climate 
taTgeta  are  currently  act  much  lower  to  effect  a  decline  in  the  attrition 
rate.  In  1971  a  centralized  *e lection  ayitem  was  let  up  and  recruitment 
and  aeleetion  it  now  aeen  a*  a  two  atage  procea*.  Recruitment,  cereer 
guidance  and  initial  aeleetion  ere  carried  out  at  the  local  Army  Caraera 
Information  Office!  (there  are  approximately  170  of  theae  distributed  in 
major  towns  end  cities  in  the  UK).  Subsequent  allocation  to  one  of  the 

IBO  employment*  far  Adult  male*  ia  made  at  the  _eutr«lixed  selection  centra 
on  the  basis  of  results  fTom  psychometric  teats  (panr.il  and  paper  achievement 
end  reasoning  teats),  interview,  medical  and  fitness  teats.  Final  allocation 
is  dependent  on  there  being  vacancies  to  match  the  applicant's  level  of 
ability  and  expressed  choice. 

3.  Infantry  aoldier  it  just  one  of  the  employments  which  may  be  chosen  and 
subsequently  allocated.  Approximately  one  third  of  a  year's  intake  will  be 
Relocated  as  Infantry  aoldiera.  The  selection  requirement*  in  terms  of  teat 
score*  required  are  co^>arstively  low,  and  the  distribution  of  ability  i* 
somewhat  skewed,  as  is  shown  below. 

•L 

fMBAi rt* 


Distribution  pi  Ability  ■  Entrants 


TAIIDATIOH 


4.  In  1976  a  etudy  was  undertaken  by  Kill cross  et  al  to  determine  the 
validity  of  the  five  battery  teate  in  uae  at  that  time.  Training  criteria 
were  need  in  tble  inatance.  The  coefficienti  produced  for  different  Army 
group*  (Infantry,  loyal  Artillery  (RA),  Royal  Corp*  of  Tramport  (RCT), 
loyal  bgineera  (IE)  and  loyal  Armoured  Corp*  (RAC))  are  ahovn  below. 


Validity  of  Battery  tests  with  different 


Aa  can  be  aaatt  the  validity  coafficient  for  tba  Infantry  i*  lower  than 
that  for  the  other  Arm*  and  Corp*  investigated.  This  difference  may  have 
bean  even  greater  if  job  performanca  criteria  were  used,  as  one  would 
expect  greater  degradation  of  training  from  lower  ability  groups  such  as 
the  Infantry.  Job  content  in  tba  Infantry  is  also  less  closely  related 
to  attainment  type  tests  than  training  criteria  and  possibly  job  content 
of  more  technical  trades,  another  reason  shown  by  Chiselli  (1966)  to  csusa 
low  validity  coefficients . 

5.  Sinca  this  work  was  undertaken,  the  battery  of  5  tests  has  been 
revised.  All  the  tests  have  been  changed  to  consistent  multiple-choice 
format  and  the  item  content  has  been  updated  somewhat.  The  complete 
battery  cme  into  uee  in  January  198).  At  this  point  the  problems  of 
validating  the  revised  tests  were  first  noted.  Having  ISO  employments, 
each  with  different  entry  requirements,  it  is  both  tiste-consuming  and 
ineffective  to  manually  natch  selection  and  training  records  for  each 
e^loyment.  It  is  essential  that  information  is  available  for  individual 
training  groups  (different  Arms  sod  Corp*  undergo  training  of  different 
types  aud  lengths)  as  the  validity  estimate  produced  for  a  composite  of 
mixed  training  groups  may  be  much  lower  than  for  the  individual  training 
groups  thtmstlve*  (Kill  cross  st  al  1976).  The  uee  of  the  Army's 
computerised  manning  and  record*  system  as  s  data  bsss  for  validation  is 
an  alternative  approach. 


6.  A  complete  lilting  of  soldiers  allocated  to  the  Infantry  for 
November  1978  to  October  1979  we  obtained.  At  a  temple,  all  thoie 
allocated  to  the  verioue  regiments  of  the  Prince  of  Wales  Divition  were 
aeleeted.  Tro*  the  co^uteriaed  lyatea.  United  inforaation  vaa  then 
available.  Thii  included  (in  addition  to  personal  dataile)  detaili  of 
initial  eelectien  (an  overall  grade  only,  no  teat-score  details), 

p  react  ion,  specialist  courses  taken  and  exit  dates.  As  a  naxiaum,  a  soldier 
froa  our  staple  would  have  served  three  and  a  half  years,  yew  had 
achieved  proaotion  or  specialist  qualifications.  The  only  criteria 
available  vat  length  of  service.  If  a  soldier  has  a  three  year  engagement 
and  wishes  to  leave  at  that  three  year  point  he  must  give  notice  at  the 
II  nonth  point.  Proa  our  date,  we  were  therefore  aware  of  the  soldier's 
plane  to  leave,  even  if  he  had  not  yet  left.  . 

7.  It  was  net  possible,  in  this  instance  to  use  performance  on  the 
basic  training  course  as  a  criterion,  as  detailed  records  are  not 
available,  and  the  pass/fail  criterion  recorded  is  inadequate  as  all 
those  soldiers  remaining  in  the  Infantry  do  in  fact  pass.  When  the 
results  of  individual  test  scores  were  correlated  with  length  of  service, 
negligible  coefficients  were  produced  ranging  from  0.01  to  0.05.  This  is 
as  would  he  expected,  due  to  lack  of  relevance,  of  the  predictor  to  the 
criterion  (Nagle  1953). 

8.  As  has  already  been  noted,  the  current  economic  climate  has  radically 
affected  the  sixe  of  the  recruiting  pool.  In  the  past,  when  recruiting 
was  more  difficult,  the  selection  ratio  was  very  high  and  for  the  Infantry 
was  almost  equal  to  unity (excluding  those  unsuitable  on  medical  or  security 
grounds).  Under  these  circumstances  no  test  will  operate  with  great 
efficiency  and  in  fact  there  may  be  little  point  in  using  the  testa  at  all 
(Taylor  end  Install  1939).  New  that  the  selection  ratio  is  more  favourable, 
the  validity  of  tha  teats  becomes  mors  important,  lecomacndationa  for 
changai  to  tbs  records  system,  which  would  allow  suitable  selection  and 
training  data  to  be  atored,  are  currently  under  discussion. 


AtmiTIOB 

9.  Tha  data  analysed,  did  however,  show  variations  of  interest. 

Tha  Prinea  of  Wales  Division  consists  of  s  number  of  regiments ,  each 
recruited  in  a  different  area  of  the  country.  Some  of  tha  regiments 
receive  their  basic  training  at  the  Prince  of  Wales  training  depot  at 
Crickhowsll,  South  Walts,  others  at  the  second  Prince  of  Wales  depot  at 
Liehfiald  in  the  hidlacds.  Differences  in  attrition  rates  between 
Divisions  of  the  Infantry  has  already  been  demonstrated  in  an  earlier  phase 
of  this  project.  Wastage  rates  varied  from  20  -  501  for  different  divisions 
(1979-80  figures).  Nov,  further  analysis  hss  revealed  differences  between 
depots  within  e  division. 


POW  Division 
Length  of  Service 


Percent 

leaving 


i — i  Lichfield 
m  Crickhowell 


rw.">A 


Length  of  Service  (months) 


I0.  The  pattern  of  v»*t»g*  to 

a  whole.  During  the  fir.t  6 »*  TJSJLt  .urn.  Thi.  period 

leave  the  kru y  either  £r**  j  After  the  6  nonth  point  the  eua 

cover*  the  tine  .pent  ih  ba»rc  training.  choice  until 

payable  i.  WH* «7 ™  out  The  -jorit,  of  the 
the  3  year  point,  when  tlwir  eng  g  „  „d  aoet  of  tho»e  vho 

Tenain  ifterl  ^.“d^Ld  at  lm  tin*  during  their  fir.t  year  to  e«.nd 

their  engagement. 

1 » •  Great  difference  between  the^t-o^r.ining^ot.  ‘ 

»\«£' wo 

leaving  (or  having  given  notice  to  leave)  at  t  •  *  eipar.te  the 

In  con.idering  training  wastage.  it  ie  not  yet  poeeinie  ™  *  #ot  ln 

overall  «f£‘c' ‘  of  eaternal  a  B  .^proportion  of  vs*  tag  a  ie 

the  Netherlands  (Tronp,  1980  **  .  -  reltie*»  0f  the  training  depot 

dependent  on  the  attitude.  ‘“‘"‘{E^rriSd  ol)t  in  the  UK  (Dennieon  19BO 
staff.  The*,  ere  intern.l  factor..  Work  c^rrl.o  u  Hovev.r, 

further  rel.te.  ve.t.ge  to  civilian  u""*lo*£"**  the  training  depot 

when  erafflining  .ub.equent  attrition,  it  »a  likely  th.t^ 
i*  no  longer  influential,  civilian  unemployment  ie  a  ■») 

,2.  The  link  between  civilian  — 

a>  unemployment  ri»«»,  £*W?T  Britain^and  currently  atanda  at 

1980/81  unemployment  ha*  ri.en  *teeply  in  »r 
.v...,  lit  eh.  working  population. 
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13.  Wastage  {»  affected  both  directly  end  indirectly  by  unenploynent 
rates.  The  direct  effect  ie  illustrated  by  the  Crictchovell/Lichfiald 
differences.  Soldiers  originally  trained  at  CricVhowtU  com  fro*  Wales, 
Devon  and  Dorset,  Avon  and  Gloucestershire.  Lichfield  trained  soldiers 
originate  froa  the  Midlands  and  the  South  of  England.  These  recruiting 
areas  are  shown  below. 


S3 


U.  These  rt|iou  Sara  differing  lmli  of 


loymant  a a  shown  below. 


Overall,  the  Lichfield  depot  train!  Ben  fro*  areas  which  have  experienced 
lower  unemployment  over  the  past  3  pears  than  those  areas  recruited  to 
Crickh owell.  We  would  expect  attrition,  especially  at  the  three  pear  point 
to  be  closely  associated  with  regional  unemployment .  This  effect  would 
he  even  stronger  if  individual  regiments,  drawn  froa  one  area  only,  were 
considered. 

15.  The  effect  of  unemployment  will  be  found  throughout  tbe  initial 
training  period  in  addition  to  the  internal  effects  of  tbe  depot  iself. 
Unemployment  also  effects  the  syttea  indirectly.  Increasing  unemployment 
it  one  factor  which  improves  the  quality  and  quantity  of  applicants  ctnaing 
forward,  at  shown  by  Bellany  (1978).  This  iiqiroves  the  eelection  ratio 
and  therefore  effectiveness  of  any  tatting  procedure  (Taylor  and  Russell  >939) 
and  increaies  the  proportion  of  brighter  candidates  in  the  recruitaent  pool. 
Such  a  change  hat  been  observed  over  time  in  tbe  British  Army. 
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It  follow  that  a  higher  ability  intaka  will,  onr  time  produce  fewer  losses. 
Thi»  effect,  will  of  courts,  operate  with  a  time  lapae  end  aey  now  be  making 
ita  firat  impression*  on  the  wmatage  figure! , 

con  CIOS lows 

17.  Though  pr  r»i OU1  reaearch  ha a  been  problematic  and  haa  produced  only 
low  validity  coefficient!,  it  ia  i^ortent  to  evaluate  and  nonitor  on  a 
regular  baaia,  the  validitiea  of  aelcction  teat!,  particularly,  aa  la 
currently  the  cue,  in  time*  of  favourable  •* lection  ratioa.  Length  of 
aenrlca  ia  an  inadequate  criteria  for  thia  purpose,  Unemployment  haa  effected 
both  racruitment  to,  and  wastage  fro*  the  Any,  and  local  jiemploynent  effecta 
should  be  taken  into  account  in  e  nodal  tAicb  attempts  to  explain  retention 
iesuei,  in  addition  to  factors  in  operation  within  the  individual  training 
depots.  It  ia  important  to  draw  methodological  conclusions  in  this  instance. 
A  broader  approach  to  validation  problems  ahould  be  adopted  at  potentially 
useful  data  nay  otherwise  remain  untapped. 
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Hicks,  Jack  M.,  US  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences,  Alexandria,  Virginia.  (Tues.  P.M.) 


Counter— Attrition  Programs  in  the  US  Military 


v/Of  special  interest  were  “counter-attrition*'  programs  in  the 
US  Military,  whether  in  the  conceptual,  research,  development,  or 
operational  stages.  This  review  included  the  US  Army,  Navy,  Air  Force, 
Karine  Corps,  and  Coast  Guard.  Attention  was  addressed  to  any  strategy 
or  mechanism  intended  to  reduce  undesirable  attrition  in  a  military 
environment.  Both  up-front  and  downstream  programs  were  scrutinized. 
Programs  addressing  such  major  issues  as  realistic  expectations, 
transition  from  training  to  unit,  and  unit  disillusionment  were  con¬ 
sidered.  Of  Interest  were  pre-military  preparation,  coping  skills, 
behavioral  modification,  continuing  education,  retraining,  moti¬ 
vational  adjustment  programs,  counseling,  use  of  post  facilities,  unit 
cohesion  strategies,  and  the  like. 

The  primary  data  gathering  tool  was  the  personal  and  telephone 
interview.  Considering  the  exploratory  nature  of  this  investigation, 
the  Interviews  were  of  an  unstructured  nature.  Maximum  latitude  was 
permitted  to  allow  for  expression  of  strongly  held  opinions,  and 
relevant  sources  of  expertise. 
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Paper  presentation  at  the  Annual  Meeting  of  the  Military  Testing  Association, 
Arlington,  VA,  October,  1981. 

COUNTER- ATTRITION  PROGRAMS  IN  THE  MILITARY 

Jack  M.  Hicks 

U.S.  Array  Research  Institute  for  the  Behavioral  and  Social  Sciences 


High  first-term  enlisted  attrition  rates  continue  to  be  a  major 
concern  in  the  U.S.  Military.  The  purpose  of  this  paper  is  to  create 
increased  awareness  of  three  categories  of  military  programs  which 
either  presently  demonstrate,  or  show  promise  of  impacting  significantly 
upon  the  enlisted  attrition.  Programs  will  be  described  which  represent 
(a)  pret raining,  (b)  realistic  expectations,  and  (c)  retraining. 

The  military  pretraining  programs  to  be  represented  are  not  primarily 
designed  to  counter  attrition.  They  are  programs  conducted  by  the  Army 
National  Guard  which  combine  Guard  recruitment  with  job  preparation  and 
placement.  These  programs,  however,  show  sufficient  promise  in  lowering 
attrition  to  warrant  representation  in  this  discussion. 

The  first  of  such  programs  began  in  Oakland,  California  approximately 
4  years  ago.  It  was,  and  still  is,  designed  for  unemployed,  and  economically 
disadvantaged  youth,  17-22  years  of  age.  The  basic  attraction  to  the 
Program  is  the  enhanced  opportunity  for  a  job.  In  order  to  get  the  help 
needed  to  acquire  that  job,  one  must  graduate  from  a  10  week  military 
preparation  program  and  sign  for  a  six  year  Guard  obligation.  The 
greatest  emphasis  in  the  Program  of  Instruction  (POI)  is  given  to  military 
skills  and  basic  literacy  skills,  but  also  career  assessment  and  pre¬ 
employment  training.  The  military  skills  component  is  primarily  physical 
fitness  training,  drills  and  ceremonies,  and  military  field  trips. 

Basic  skills  training  consists  mostly  of  reading,  writing,  and  mathematics, 
but  with  some  "survival  training."  Survival  training  places  a  heavy 
emphasis  upon  the  importance  and  value  of  community  awareness  and  service, 
"banking,"  "nutrition,"  "cultural  studies,"  and  other  fundamental  education. 
Career  assessment  takes  up  one  (1)  week  of  the  program,  and  consists  of 
testing,  evaluation,  and  vocational  counseling.  Pre-employment  training 
occupies  about  14  weeks  and  addresses  a  plethora  of  concerns  such  as  how 
to  complete  an  application  form,  how  to  prepare  a  resume,  how  to  conduct 
oneself  in  an  interview  and  on  the  job,  and  introduction  to  selection 
tests  used  by  employers,  and  the  like.  The  Program  focuses  upon  the 
development  of  coping  skills  for  employment  and  military  settings, 
discipline,  civics,  teamwork,  and  the  skills  to  pass  the  Armed  Services 
Vocational  Aptitude  Battery.  All  of  this  adds  up  to  $1500  to  $2000  per 
pupil.  Perhaps  as  few  as  20%  of  the  applicants  are  accepted.  Of  this 
20%,  approximately  50%  graduate.  Rejections  and  failures  derive  primarily 
from  substandard  literacy  skills,  and  negative  attitudes  toward  the 
military.  At  least  90%  of  the  graduates  successfully  meet  Army  screening 
criteria,  and  proceed  to  Basic  Training.  Of  interest  for  the  active 
Army  is  that  there  has  been  only  about  a  10%  attrition  rate  among  such 
pretrained  recruits  throughout  Basic  and  Advanced  Individual  Training 
(AIT).  Non-pretrained  recruits  have  attrited  at  approximately  double 
that  rate  during  the  full  training  phase.  Also  of  potential  interest  is 
that  perhaps  as  many  as  1/1  indicate  thar  they  would  prefer  to  remain 
with  the  active  Army  rather  than  return  to  their  communities  and  fulfill 
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their  ARNG  obligation.  At  present,  upon  successful  completion  of  Army 
training,  the  new  Guardsmen  return  to  their  community  where  they  will 
receive  help  in  getting  MOS-related  jobs,  and  attend  monthly  d-ills,  and 
summer  training.  To  private  sector  employers,  perhaps  the  most  attractive 
feature  of  the  Program  derives  from  the  credibility  of  military  training. 
Employers  appear  to  be  more  willing  to  hire  a  youth  who  has  demonstrated 
the  discipline  assumed  to  be  required  to  successfully  complete  Basic  and 
AIT.  At  present  the  original  Oakland  program  has  been  adapted  to  Los 
Angeles  and  Sacramento,  California,  Seattle,  the  tri-city  area  in  Washington 
state,  and  Milwaukee,  Wisconsin. 

Realistic  Expectations.  The  military  has  made  several  recent  audio¬ 
visual  efforts  specifically  designed  to  provide  new  recruits  with  a 
heightened  awareness  of  what  the  military  will  be  like  and  how  to  deal 
with  it.  It  has  been  hypothesized  that  enlisted  attrition  should  in 
some  degree  be  inversely  related  to  such  increased  awareness.  The  first 
of  these  efforts  was  conducted  by  the  Marine  Corps  (circa  1978) .  An  80 
minute  color  video  tape  was  developed  jointly  by  a  private  contractor 
and  the  Training  Support  Center  of  the  USMC  Parris  Island  Recruit  Depot. 

The  film  is  considered  a  Realistic  Job  Preview  (FJP),  attempting  to 
provide  recruits  an  accurate  portrayal  of  what  life  in  the  Marine  Corps 
would  be  like.  It  includes  footage  from  all  phases  of  recruit  training 
complete  with  voice  overs  and  interviews.  The  film  was  evaluated  by 
Horner,  Mobley  and  Meglino  (1979)  based  upon  its  viewing  by  678  Parris 
Island  male,  first-term,  enlisted  recruits.  It  was  shown  after  swearing 
in,  during  the  second  full  day  at  the  Recruit  Depot.  Results  did  not 
show  a  signficant  association  between  film  viewing  and  attrition  by  the 
end  of  training,  but  did  so  in  the  hypothesized  direction  after  6  months 
and  1 2  months . 

It  would  be  of  interest  to  have  the  evaluation  of  the  above  film 
replicated  by  an  independent  research  group.  Such  an  independent  evaluation 
was  conducted  in  regard  to  a  San  Diego  (Navy)  version  of  the  Marine 
Corps  film.  The  San  Diego  film  involved  the  same  contractor,  and  was 
produced  by  the  Center  for  Naval  Technical  Training  in  1979.  It  is 
approximately  1  hour  in  length  and  shows  actual  Navy  enlisted  recruits, 

DIs,  chow  lines,  obstacle  courses,  and  the  like.  No  professional  actors 
were  used.  The  evaluation  of  this  film  showed  no  significant  differences 
in  attrition  rates  between  those  (n-1049)  who  viewed  the  film  and  those 
(n>*1002)  who  did  not  (Lockman,  1980).  Similar  "no  impact"  findings  have 
been  reported  by  a  recent  initial  evaluation  of  a  Great  Lakes  version  of 
this  film. 

An  additional  Navy  effort  is  a  20  minute  recruit  "coping  skills" 
film  (Sarason,  circa  1979;  personal  communication).  The  film  content 
focuses  upon  "(a)  what  to  expect,  (b)  how  you  may  feel,  and  (c)  what  to 
do."  Again,  initial  evaluation  has  shown  no  impact  upon  attrition.  A  6 
month  follow-up  of  this  film  is  forthcoming. 


More  recently,  the  Amy  has  developed  realistic  expectations  and 
coping  skills  films.  The  realistic  expectations  film  is  a  28  minute 
color  video-tape  providing  information  about  the  content  of  Basic  Combat 
Training  (BCT).  It  consists  of  fou.  components:  Introduction,  Weapons 
Training,  Individual  Tactical  Training,  and  Necessary  Testing  Activities 
(I  WIN).  This  videotape,  like  the  others,  shows  actual  situations, 
though  it  concentrates  on  training.  Though  it  has  a  professional  narrator, 
all  participants  are  actual  on-site  personnel  at  Ft.  Jackson,  South 
Carolina.  At  present,  the  film  is  operational  at  all  Army  installations 
where  purely  BCT  is  offered.  It  is  either  shown  within  the  context  of 
the  reception  center  or  slightly  later  in  the  BCT  unit.  It  has  only  been 
very  recently  that  this  film  was  made  operational,  and  no  evaluations 
are  available. 

Retraining.  Of  particular  interest  in  this  paper  are  enlisted 
retraining  programs.  These  are  basically  interventions  which  perhaps 
represent  the  closest  device  the  U.S.  Military  presently  employs  toward 
the  establishment  of  counter-attrition  programs,  per  se. 

Of  primary  interest  are  programs  designed  for  marginal  performers 
and/or  problem  personnel  who  show  potential  for  development  into  productive 
soldiers  or  sailors.  The  most  comprehensive  and  established  of  such 
programs  at  the  present  time  is  the  U.S.  Army  Retraining  Brigade  (USARB) 
at  Camp  Funston,  Ft.  Riley,  Kansas.  The  original  USARB  Program,  known 
as  the  Army  Correctional  Training  Facility,  became  operational  in  1968. 

It  represented  a  landmark  for  the  Army  correctional  system  due  to  its 
emphasis  "upon  systematic  restoration  and  utilization  of  potentially 
wasted  manpower."  (USARB  Annual  Report,  1980).  During  its  first  4 
years,  the  Correctional  Training  Facility  received  over  23,000  prisoners 
from  Army  stockades  worldwide  and  returned  over  19,000  for  subsequent 
service. 

Physical  training  is  the  feature  most  emphasized  in  the  present 
USARB  POI.  It  is  deliberately  designed  to  exert  sustained  physical  and 
mental  stress  within  a  spartan  military  environment.  Mental  stress  is 
generated  from  continued  observation,  daily  evaluations  on  a  variety  of 
dimensions,  peer  pressure,  and  high  performance  standards. 

USARB  graduates  are  given  a  special  Enlisted  Evaluation  Report  (EER) 
after  60  days  in  their  new  duty  assignments.  Of  457  FY80  graduates 
returned  to  duty,  71Z  were  rated  as  promotable  immediately  or  ahead  of 
their  peers.  Only  10%  were  rated  nonpromotnble .  Tins  seems  a  remarkable 
turnaround  considering  that  only  about  4  months  previously,  these  individuals 
were  awaiting  court-martial  in  an  Installation  Detention  Facility  (or 
stockade)  . 

A  significant  outgrowth  of  USARB,  is  a  co-located  program  at  Ft. 

Riley  called  the  Individual  Effectiveness  Course  (IEC).  This  program 
became  operational  in  1977  and  is  significant  in  that  it  has  served  as  a 
model  program  for  light  offenders  who  are  regarded  as  having  potential 
for  productive  duty.  The  IEC  is  distinguished  from  USARB  in  that  it 
is  for  Ft.  Riley  enlisted  personnel  only,  is  a  non-confinement  facility. 


and  is  designed  to  counteract  maladaptive  propensities  before  significant 
punishable  offenses  actually  occur.  The  POI  is  6  weeks  long  and  otherwise 
is  very  similar  to  USARB.  There  may  be  as  many  as  50  trainees  in  each 
class,  who  may  or  may  not  graduate.  Recommendations  as  to  retention  or 
elimination  are  made  to  the  unit  commander  in  each  individual  case. 

In  a  study  of  197  IEC  trainees,  54%  graduated  and  returned  to  duty, 

40%  were  recommended  for  various  administrative  separations,  and  6%  were 
lost  through  AWOLs  or  medical  profiles.  At  initial  referral,  unit 
commanders  indicated  probable  actions  to  be  taken  within  30  days  if  the 
soldiers  had  not  been  admitted  to  the  IEC.  Predicted  actions  were  33% 
Expeditious  Discharges,  32%  Chapter  13s  (unsuitability),  27%  miscellaneous 
transfers  and  administrative  separations,  and  8%  Article  15s.  That  is, 
some  punitive  action  was  expected  for  all  197  trainees.  Fifty-four  percent 
of  these  197  graduated  from  the  program,  a,:d  returned  to  duty.  As  with 
USARB,  EERs  were  conducted  for  these  graduates  60  days  after  returning 
to  duty.  In  terms  of  advancement  potential,  nearly  half  were  rated  as 
promotable  immediately  or  ahead  of  their  peers.  This  seems  to  stand  in 
marked  contrast  to  the  above  dim  prognosis  prior  to  IEC  training. 

A  counterpart  Navy  program  became  operational  in  1979  after  having 
carefully  studied  USARB  and  the  IEC.  This  program,  called  the  Behavioral 
Skills  Training  Unit  (BEST)  is  located  at  the  Naval  Amphibious  Base  in 
Little  Creek,  Virginia.  It  is  shorter  in  duration  than  the  IEC  -  4 
weeks  instead  of  6 — but  again  heavily  emphasizes  physical  training  such 
as  daily  distance  running,  an  obstacle  course  twice  weekly,  and  group 
sports.  Additional  emphasis  is  placed  upon  individual  and  group  counseling, 
goal  setting,  and  academic  advancement.  The  mission  of  BEST  is  to 
provide  behavioral  skill  training  to  low  and  marginal  performing  fir3t 
term  enlisted  personnel  that  will  enable  them  to  successfully  complete 
their  obligated  service.  Criteria  for  admission  Include  no  disciplinary 
action  pending,  at  least  2  years  of  active  obligated  service  remaining, 
and  the  potential  to  complete  enlistment,  but  unlikely  to  do  so  given 
the  present  demeanor  and  record  of  achievement. 

As  of  June  1981,  BEST  had  enrolled  47  classes,  averaging  approximately 
24  each,  for  a  total  of  1145  trainees.  C  chese,  86%  graduated,  with 
the  remainder  either  failing  the  program  being  returned  to  their 
Command.  All  participants,  including  r  raduates,  are  evaluated  at  6 
and  12  month  intervals  subsequent  to  EE.  training.  Of  700  6  month 
evaluations,  69%  were  rated  as  performing  av  rage  or  above.  Six  months 
prior  to  BEST  training,  78%  were  awarded  non-judicial  punishments  and/or 
courts  martial,  whereas  6  months  after  BEST,  only  33%  received  similar 
punishments.  Twenty-two  percent  have  received  recognition  for  outstanding 
performance,  and  an  additional  22%  have  been  promoted  at  least  once. 

The  12  month  evaluations  reveal  a  similar  pattern,  with  even  greater 
percentages  (79%)  having  received  recognition  for  outstanding  performance 
or  at  least  one  promotion.  Again,  as  with  USARB  and  the  IEC,  a  dramatic 
turnaround  seems  indicated. 
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An  additional  Army  program  clearly  shows  promise  as  a  potentially 
productive  counter-attrition  program.  This  program,  called  the  Intensified 
Training  Unit  (ITU)  is  co-located  with  the  Installation  Detention  Facility 
(stockade)  at  Ft.  Carson,  Colorado.  The  ITU  seems  unique  at  the  present 
time  in  that  it  may  be  the  only  Army  Correctional  Custody  Facility  (CCI; 
dedicated  to  a  retraining  rather  than  a  work  detail  orientation.  Again, 

USARB  and  the  IEC  served  as  models  for  the  development  of  the  ITU  POI. 

The  POI  is  30  days,  and  is  consistent  in  content  and  objectives  to  programs 
already  described.  The  ITU  received  its  retraining  mandate  in  1978  as  a 
part  of  an  effort  by  the  Ft.  Carson  Law  Enforcement  Command  to  upgrade 
its  correctional  program  to  a  more  remedial  as  opposed  to  custodial 
format.  Like  other  CCFs,  the  ITU  is  a  non-confinement  unit,  providing 
for  close  supervision  over  soldiers  -*ho  commit  minor  infractions.  A  key 
limitation,  at  present  with  the  ITU,  is  that  a  punishable  offense  is  required 
for  admission.  A  review  of  statistical  data  maintained  for  a  three 
month  period  in  1978  showed  that  of  67  correctees  who  completed  the  ITU 
and  returned  to  their  units,  42  have  since  received  performance  ratings 
of  good  to  outstanding.  More  substantial  data  are  needed  for  a  complete 
evaluation  of  the  effectiveness  of  the  ITU,  but  preliminary  indications 
suggest  positive  outcomes  similar  to  previously  described  programs. 

Other  similar  programs  could  be  mentioned  that  now  exist,  or  are 
getting  under  way,  throughout  the  Armed  Forces.  The  above  programs  are 
among  the  oldest,  best  established,  best  evaluated,  to  date.  There  are 
indications  that  such  programs  are  proliferating  at  the  local  level.  The 
ITU,  for  example,  has  sent  packets  of  information  to  at  least  a  half 
dozen  interested  installations  thus  far.  Both  the  Defense  Audit  Service 
and  the  Army  Audit  Agency  have  made  their  own  analyses  of  several  of  the 
above  programs — both  concluding  cost-effectiveness  and  recommending 
expansion. 
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"Data  on  the  Federal  Criminal  Investigating  Series  were  taken  from  the 
Central  Personnel  Data  File  (the  main  repository  for  employment  history 
and  demographic  information  of  Federal  civilian  employees),  for 
1977-1980.  Four  year  trends  in  the  size  and  race/sex  composition  of  the 
full-time  series  population  were  developed.  It  was  found  that  the  total 
population  size  was  rather  stable  during  the  past  four  years,  but  that 
various  race/sex  groups  experienced  considerable  growth  or  decline. 

The  trend  data  were  used  as  the  basis  of  projections  for  the  1981-1985 
period.  Among  the  assumptions  upon  which  the  projections  were  based  was 
that  the  1981-1985  period  would  be  similar  to  the  preceeding  four  years; 
probable  divergences  were  pointed  out,  and  their  effects  speculated  upon. 

The  usefulness  of  data  such  as  those  examined  in  this  paper  for 
purposes  of  forcasting  and  human  resources  planning  was  considered,  and 
determined  to  be  positive,  if  incomplete,  j n 
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Data  Source 


Except  where  otherwise  noted,  figures  in  this  report  are  based  on  the  entire 
population  of  criminal  investigators  working  in  the  United  States  during 
1976-77  to  1979-80.  Records  of  all  full-time  permanent  employees  of  the 
criminal  Investigating  series  contained  in  the  Central  Personnel  Data  File 
(the  main  repository  for  employment  history  and  demographic  information  of 
civilian  employees  of  the  Federal  Government)  were  included  in  the  data 
analyses.  Thus,  obtained  figures  are  more  properly  regarded  as  population 
parameters  than  sample  statistics.  Among  other  things,  this  renders  the 
use  of  tests  of  statistical  significance  largely  inappropriate. 

It  should  be  noted  that  any  yearly  figure  is  as  of  June  30  of  that  year; 
a  one  year  period  is  from  July  1  of  a  year  to  the  next  June  30. 

This  paper  focuses  on  total  population  and  on  race/ sex  composition,  but  it  is 
part  of  a  larger  study  which  also  examines  other  demographic  characteristics  ,and 
personnel  dynamics  such  as  accessions,  losses  and  Inter-occupational  movement. 

Employment  In  the  1977-1980  Period 


Population  Changes 


Total  Population.  Data  from  the  Central  Personnel  Data  File  (CPDF)  indicate 
that  the  number  of  full-time  permanent  employees  in  the  criminal  investigating 
series  changed  very  little  in  the  three  years  from  June  30,  1977  to  June  30,  1980. 
As  Table  1  shows,  the  number  of  full-time  permanent  employees  on  board  on 
June  30,  1980  was  19,064.  This  is  less  than  one  percent  larger  than  the 
June  30,  1977  figure  of  18,920.  Changes  across  one  year  intervals  were  never 
greater  than  1  1/3  percent  (which  was’  the  amount  of  increase  between  June  30, 

1979  and  June  30,  1980)  .  The  average  yearly  change  in  the  total-full  time 
permanent  population  of  Federal  criminal  investigators  was  .28%. 

Table  1 

Full-Time  Permanent  Employees  in  the  Criminal  Investigating  Series,  1969-1972 

and  1977-1980 


Series  Totals 


June 

1969 

June 

1970 

June 

1971 

June 

1972 

June 

1977 

June 

1978 

June 

1979 

June 

1980 

14,015 

15,196 

16,834 

18,852 

18,920 

18,734 

18,806 

19,064 

Percent 

Change 

1969- 

1970 

1970- 

1971 

1971- 

1972 

1969- 

1972 

1972- 

1977 

1977- 

1978 

1978- 

1979 

1979- 

1980 

1977- 

1980 

1969- 

1980 

8.4 

10.8 

11.9 

35.0 

.36 

-.98 

.38 

1.37 

.76 

36.0 
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Table  1  also  compares  the  number  of  full-time  permanent  employees  in  this  series 
during  the  three  most  recent  years  to  the  series  population  during  the  period 
June  1969-June  1972  (U.S.  Civil  Service  Commission.  1973).  The  two  sets  of 
figures  are  not  exactly  equivalent  since  the  data  from  the  1969-1972  period 
is  an  estimate  based  on  a  sample  of  all  General  Schedule  employees  in  the 
criminal  investigator  series.  The  small  number  of  non-permanent  full-time 
employees,  however,  increases  the  comparability  of  the  two  sets  of  figures. 

It  can  be  seen  from  the  information  in  the  table  that  the  years  1969-1972  were 
years  of  growth  for  the  criminal  investigator  series,  with  an  average  annual 
increase  of  10. 4%.  The  difference  between  1972  and  1977,  however,  is  .36%,  or 
1/3  of  one  per  cent;  and  the  change  in  the  criminal  investigator  population 
between  June  1969  and  June  1980  of  36%  is  only  one  per  cent  greater  than  the 
35%  change  between  June  1969  and  June  1972.  These  data  provide  support  for 
the  claim  that  the  total  population  in  series  1811  has  been  relatively  stable 
for  the  past  few  years,  although  it  is  possible  that  fluctuations  in  different 
directions  between  1972  and  1977  balanced  out  to  present  a  picture  of  seeming 
lack  of  change.  (Directly  comparable  data,  i.e.  series  totals  as  of  June  30 
1972-77,  are  not  available.  The  series  population  totals  which  are  available, 
as  of  October  31  of  the  relevant  years,  cannot  be  compared  to  the  June  figures 
because  of  seasonal  differences). 

Sub-groups.  Although  the  population  as  a  whole  displays  considerable  stability, 
there  is  a  good  deal  of  variability  in  the  numbers  and  proportions  of 
different  race/sex  sub-groups.  Table  2  presents  these  data  for  the  years 
1977-1980.  It  is  obvious  that  the  occupants  of  the  criminal  Investigator 
position  are  overwhelmingly  non-minority  (91.7%  In  1980)  and  male  (96.1% 
in  1980).  Nevertheless,  an  increasing  number  of  women  and  minorities  have 
entered  this  occupation  since  1977,  while  the  proportion  of  non-minority  males 
has  decreased  by  about  4%. 

As  Table  2  shows,  in  1977  the  population  of  criminal  investigators  in  the 
Federal  service  was  98.4%  male,  and  94.1%  non-minority.  By  1980  these 
figures  had  decreased  to  96.1%  and  91.7%  respectively.  There  has  been  a 
slight  decline  in  the  proportion  of  males  each  year,  as  well  as  in  the 
proportion  of  non-minority  employees,  due  entirely  to  a  decline  in  the 
proportion  of  non-minority  males.  From  1977  to  1980,  there  has  been  an  overall 
increase  in  the  number  of  females  of  150%;  the  average  yearly  increase  was  35.92%. 
The  total  proportion  of  women  in  this  occupation,  however,  remains  low — 

3.9%  as  of  June  1980.  The  increase  in  non-minority  females  has  been  152.55% 
overall,  with  a  yearly  average  increase  of  36.41%;  their  present  total 
proportion  is  3.4%  of  the  total  series  population.  The  increase  for  minority 
females  has  averaged  32.77%  annually,  for  a  total  Increase  from  June  1977 
to  June  1980  of  135.71%,  The  current  proportion  of  the  series  population 
represented  by  minority  women  remains  quite  small —  about  .5  percent.  Total 
minority  representation  has  increased  by  42.95%  in  the  period  1977-1980, 
averaging  12.73%  a  year.  The  current  proportion  of  minorities  is  8.3%. 

The  increase  for  minority  males  is  also  consistent  with  this  general  trend. 

The  overall  increase  was  39.31%;  the  average  yearly  increase  11.77%.  The 
proportion  of  minority  males  in  the  total  series  has  increased  from  5.9% 
to  8.3%  between  June  1977-June  1980. 


Table  2 


Proportions  of  Full-time  Permanent  Criminal  Investigators  Belonging  to 


Various 

Race/Sex  Groups, 

Series  Population 

Number 

1977 

1978  1979 

Male 

18,623 

18, 

319  18,281 

Female 

297 

415  525 

Non-Minority 

17,812 

17 

,550  17,422 

Non-Minority 

Male 

17,557 

17 

,192  16,973 

Non-Minority 

Female 

255 

358  449 

Minority 

1,108 

1 

,184  1,384 

Minority 

Male 

1 ,066 

1 

,127  1,308 

Minority 

Female 

42 

57  76 

*  Percentages 

nay  not  add  up 

to  exactly 

100%  due  to  rounding 

Per  cent  Change 

1977-1978 

1978-1979 

Male 

-1.63 

-.21 

Female 

39.73 

26.51 

Non-Minority 

-1.47 

-.73 

Non-Minority 

Male 

-2.08 

-1.27 

Non-Minority 

Female 

40.39 

25.42 

Minority 

6.86 

16.89 

Minority 

Male 

5.72 

16.06 

Minority 

35.71 

32.33 

Female 

977-1980 


Per  cent  of  Population* 


1980 

1977 

1978 

1979 

14 

t 

18,321 

98.4 

97.8 

97.2 

9(1 

743 

1.6 

2.2 

2.8 

3 

17,480 

94.1 

93.7 

92.6 

91 

16,836 

92.8 

91.9 

90.2 

8fl 

644 

1.3 

1.8 

2.4 

3 

1,584 

5.9 

6.3 

7.4 

a 

1,485 

5.6 

6.0 

7.0 

7 

99 

.2 

.3 

.4 

1979-1980 

1977-1980 

.22 

-1.62 

41.52 

150.00 

.33 

-1.86 

-.81 

-4.11 

43.43 

152.55 

14.45 

42.96 

13.53 

39.31 

30.26 

135.71 

Employment  Projections  to  1985 


About  Projections 


Projections  are  used  to  provide  information  about  the  future  growth  or 
decline  of  a  population.  A  projection  does  not  necessarily  represent  the 
"best  guess"  about,  or  most  probable  estimate  of,  a  population  at  a  given  point 
in  the  future;  rather,  it  is  a  picture  of  the  population  based  on  a  set  of 
assumptions  and  hypothesized  relationships  which  are  more  or  less  likely. 

Thus,  some  projections  may  be  quite  unlikely  to  occur,  but  are  made  to  illustrate 
a  ’'what-if"  situation  of  theoretical  importance;  others  are  based  on  what  the 
author  of  the  projection  believes  to  be  the  most  probable  future  occurrences. 

This  latter  class  of  (most  probable)  projections  are  properly  termed  forecasts. 

Because  the  accuracy  of  assumptions  is  not  always  certain,  and  because 
the  assumptions  underlying  a  projection  have  such  a  great  effect  on  the 
projection,  it  is  considered  desirable  to  develop  more  than  one  set  of 
projections  of  population  change.  For  example,  a  series  of  projections  recently 
released  by  the  Bureau  of  Labor  Statistics  (Fullerton,  1980),  displayed  three 
labor  force  growth  scenarios  for  the  period  1985  to  1995.  High,  intermediate 
and  low  growth  projections  were  made  which  reflect  different  overall  participation 
rates  for  males  and  females,  blacks  and  whites,  for  the  four  major  race/sex 
sub-groups,  and  for  different  age  groups.  The  participation  rates  assumed  for 
each  scenario  were,  in  turn,  based  upon  three  (high,  middle  and  low)  series 
of  projections  of  general  population  change  produced  by  the  Bureau  of  the 
Census . 

For  general  population  projections  there  are  three  major  components  of 
population  change  about  which  assumptions  are  typically  made:  fertility 
(births),  mortality  (deaths)  and  migration  (population  movement  into  and  out 
of  the  area).  Projections  may  reflect  assumptions  about  how  these  factors 
will  change,  combine  or  intereact  to  affect  the  future  population.  For  labor 
force  studies,  such  as  the  present,  assumptions  may  be  made  about  accessions, 
separtions  and  migrations  between  jobs  or  occupations,  as  well  as  about  total 
labor  force  participation  or  size. 

Assumptions  and  Components  of  Projections  in  the  Current  Study 

Because  of  severe  limitations  in  the  data  resources  available  to  the  current 
project,  it  was  possible  to  make  only  very  simple  projections  of  future 
changes  in  the  population  of  the  1811  series.  The  projections  of  series 
and  sub-group  populations  and  of  their  demographic  characteristics  are  based 
on  the  assumption  that  the  trends  of  the  recent  past  (1977-1980)  will  continue 
during  the  period  of  projection  (1981-1985).  In  other  words,  it  was  expected 
that  the  rate  of  change  in  the  population  during  1981-1985  will  be  similar  to 
that  experienced  during  the  past  four  years.  Furthermore,  contributions  of 
changes  in  each  of  the  three  major  components  of  population  change  (namely 
accessions,  separations  and  occupation  changers)  are  not  used  as  the  bases 
for  assumptions  about  total  series  population  change.  Rather,  total  population 
change  is  projected  on  the  way  the  total  population  changed  in  the  recent  pa6t. 
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Accessions,  losses  and  occupation  changers  have  been  examined  individually 
in  another  part  of  this  study  and  projections  of  their  futui  ‘  growth  a;e  made. 
These  data  are  not,  however,  then  used  to  "correct"  or  revise  the  population 
projections  presented  in  the  first  half  of  the  report.  This  is  one  limitation 
of  the  present  study. 

A  major  risk  involved  in  using  average  growth  in  the  period  1977-1980  as 
the  basis  for  projections  for  1981-1985  is  that  the  base  period  may  have  been 
atypical  or  unusual  in  some  way.  For  example,  the  active  recruitment  of  females 
and  minorities  during  the  base  period  may  have  lead  to  a  higher  proportion  of 
female  and  minority  criminal  investigators  than  would  be  the  case  under  other 
conditions.  Active  minority  and  female  recruitment  may  not  continue  through 
the  1980's.  A  parallel  but  opposite  risk  is  that  the  period  of  projection  may  be 
one  that  departs  from  even  long-standing  trends  in  the  target  occupation  because 
of  changes  in  the  external  environment.  Thus,  severe  budget  constraints,  which 
appear  to  be  the  trend  of  the  future,  mey  interfere  with  the  past  pattern  of 
population  change.  To  the  extent  that  the  1981-1985  period  differs  in  any 
component  of  population  change  from  the  1977-1980  period,  the  projections 
made  in  this  report  may  diverge  from  actual  trends. 

Development  of  Projections 

Two  methods  of  making  simple  linear  projections  of  series  population  and 
race/ sex  composition  based  on  the  CPDF  data  reviewed  above,  were  tried.  One 
method,  which  was  based  on  the  average  annual  proportion  of  change,  resulted  in 
a  bloated  population  figure,  and  severe  disparities  in  projected  population 
size  depending  on  whether  the  projection  was  based  on  growth  of  the  total  series 
or  growth  of  individual  race/sex  sub-groups  .  This  method  was  rejected  as 
unrealistic.  The  other  method,  which  was  based  on  the  change  in  average  annual 
population  frequencies,  produced  a  scenario  with  slow  overall  growth  and  more 
rapid,  but  contained  growth  in  the  number  of  minority  and  female  criminal 
investigators.  This  second  method  was  viewed  as  more  realistic,  and  was 
used  as  the  basis  for  further  projections.  The  two  methods  and  their  outcomes 
are  discussed  in  more  detail  below. 

The  Proportions  Method.  Figures  projected  by  this  method  are  based  on  the 
assumption  that  changes  in  population  in  the  series,  and  among  its  subgroups  in¬ 
cluded  in  the  report  will  continue  to  increase  or  decrease  by  the  same  percentage 
as  in  previous  years.  Using  this  method  results  in  two  widely  different  figures 
depending  upon  whether  the  series  population  projected  for  1985  is  based  on  the 
average  annual  proportion  increase  in  the  series  as  a  whole  during  1977-1980, 
or  based  on  the  sum  of  projections  made  for  the  amount  of  growth  in  each  of  the 
four  major  race/sex  sub-groups,  projections  for  each  of  the  sub-groups  are 
derived  from  their  individual  average  annual  proportionate  growth.  The  projected 
population  size,  based  on  the  increase  in  the  series  as  a  whole  is  19,314.  The 
projected  population  size  for  the  total  population  built  up  from  projected 
increases  in  each  of  the  four  sub-groups  is  24,099.  The  sum  of  the  parts  are 
rather  anomalously  larger  than  the  whole.  This  is  obviously  extremely 
problematic,  and  argues  against  using  this  method  for  making  projections. 
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The  reason  for  the  discrepancy  is  that  the  population  as  a  whole  had  a  low 
average  proportionate  increase,  but  the  various  sub-groups  had  widely  different 
growth  rates  which  this  method  does  not  '’force'*  to  stay  within  a  limit  set  by 
the  total.  In  particular,  minorities  and  females  had  rather  high  rates  of 
growth,  while  non-minority  males  had  a  low  rate  of  decline.  Since  the  minority 
and  female  sub-groups  are  so  small,  even  a  slight  numerical  increase  results 
in  a  rather  spectacu  ar  proportionate  increase.  For  example  if,  in  1977  there 
were  only  100  female  investigators,  an  increase  of  25  new  female  investigators 
represents  an  increase  of  25  per  cent.  The  next  year,  the  expanded  base  of 
125  female  investigators  would  require  about  30  women  to  achieve  a  252  increase; 
the  third  year's  base  of  155  would  require  about  37  women  to  achieve  a  252 
Increase  and  so  on.  Thus,  aG  the  base  population  expands,  the  same  numerical 
Increase  each  year  would  yield  a  declining  rate  of  Increase;  to  sustain  the 
same  rate  of  increase  requires  ever  larger  numerical  growth.  As  the  numbers 
of  females  and  minorities  begin  to  swell,  it  becomes  more  unlikely  that  they 
will  actually  sustain  the  rate  of  growth  predicted  by  this  projection.  It  is  also 
extremely  unrealistic  to  expect  the  series  to  reach  a  size  of  24,099  by  1985; 
thio  is  a  272  expansion  which  is  not  consistent  with  any  of  the  other  projections 
in  the  report.  It  is  more  likely  to  assume  that  real  growth  will  be  closer  to  the 
figure  of  19,314  calculated  from  the  single  total  population  figure,  and  that  the 
potential  growth  in  any  individual  sub-group  will  be  necessarily  limited  by  total 
series  population  growth.  Figures  generated  by  this  method  are  shown  in  Table  3. 

This  method  was  rejected  as  a  basis  of  making  projections  of  series  change 
because  the  resulting  figures  were  so  problematic  and  unrealistic.  The  data 
produced  from  this  method  of  calculation,  do  however,  serve  one  function.  They 
indicate  that  it  is  extremely  unlikely  that  the  present  rate  of  growth  of 
minorities  and  women  in  the  1811  series  can  continue  unabated. 

Table  3 

Discrepancies  between  Projections  of  Criminal  Investigator  Population  when 
Calculated  for  Total  Series  Population,  or  as  a  Sum  of  Sub-groups,  based 

on  Proportions  Method 


Non-Minority 
Ma  le 

Non-Minority 

Female 

Minority 

Male 

Minority 

Female 

Total  from 
Sub-groups 

Total 

Calculated 

Indep. 

Average 
Annual  2 
Increase 
1977-80 

-1.37 

50.85 

13.10 

45.24 

.26 

Projection 
of  Pop. 
to  June 

1985 

15,715  + 

4,993  + 

2,749  + 

642 

=■  24,099  it 

19,314 
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The  Frequencies  Method 


Total  population.  This  series  of  projections  Is  based  on  the  average 
numerical  change  In  population  or  sub-population  over  the  past  three  years. 

The  average  annual  frequency  by  which  the  series,  and  each  of  its  race/sex 
sub-groups  increased  or  decreased  during  1977-80  is  projected  as  the 
future  annual  increase  or  decrease. 

For  example,  female  criminal  investigators  have  increased  at  an  average  of  149  per 
year  for  the  past  three  years  (although  the  actual  numbers  have  varied  from 
year  to  year).  To  make  the  projection,  the  number  of  females  in  1980,  which 
was  743,  was  used  as  the  base.  For  each  year  included  in  the  projection  149 
females  were  added.  Thus,  it  is  projected  that  in  1981  there  will  be  842  females, 
in  June  of  1982  there  will  be  1041  females,  and  by  June  of  1985,  there  will  be 
1488  female  criminal  investigators.  Similar  projections  were  made  for  all 
four  race/sex  sub-groups  included  in  this  study.  As  Table  4  shows,  if  present 
trends  continue,  it  is  projected  that  the  1985  population  of  criminal  investigators 
will  total  19,304.  This  is  less  than  12  greater  than  the  June,  1980  series 
population  of  19,064  .  Males  will  make  up  92.32  of  this  group,  of  which  81.0  2 
will  be  non-minority  males.  Of  the  projected  7.72  females,  it  is  estimated 
that  only  17.  will  be  minority  females. 

This  method  produced  projections  which  appeared  to  be  of  realistic  size 
and  internally  consistent.  The  figures  it  produced  were  therefore  accepted 
for  use  in  this  project.  This  should  not  be  taken  as  meaning  that  this  set 
of  projections  represents  the  most  probable  future  events;  their  probability 
or  likelihood  in  comparison  to  projections  others  than  those  produced  by 
the  proportions  method  discussed  above  has  not  been  assessed. 

Table  4 

Projections  of  Criminal  Investigator  Population  by  Method  of  Frequencies 

1981-1985 

Average  annual 


increase  1977- 
1980 

June 

1980 

June 
1981 
(  P 

June 

1982 

K  0  J 

June 

1983 

E  C 

June 

1984 

T  E 

June 

1985 

D  ) 

Total 

48 

19,064 

19,112 

19,160 

19,208 

19,256 

19,304 

Male 

101 

18,321 

18,220 

18,119 

18,018 

17,917 

17,816 

Female 

149 

743 

B92 

1,041 

1 ,190 

1,339 

1,488 

Non-Minority 

111 

17,480 

17,369 

17,258 

17,147 

17,036 

16,925 

Non-Minority 

240 

16,836 

16,596 

16,356 

16,116 

15,876 

15,636 

Male 

Non-Minority 

131 

644 

775 

906 

1,037 

1,168 

1,299 

Female 

Minori ty 

159 

1,584 

1  ,743 

1,902 

2,061 

2,220 

2,379 

Minority  Male 

140 

1  ,485 

1  ,625 

1  ,765 

1  ,905 

2.045 

2,185 

'  inor i t y  Female 

19 

99 

118 

137 

156 

175 

194 
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Conclusions 


This  paper  examined  in  detail  two  of  several  important  trends  in 
the  demographic  characteristics  and  personnel  dynamics  of  the  criminal 
investigating  series  during  1977-1980  which  were  studied  by  the  author. 

It  is  unfortunate  that  space  and  time  limitations  precluded  an  analysis 
of  the  other  demographic  and  personnel  dynamics  trends  in  this  forum,  but  it 
is  hoped  that  the  few  data  which  were  presented  capture  the  flavor  of  the 
larger  study  from  which  they  were  taken. 

It  is  believed  that  the  systematic  reporting  of  detailed  Federal  single¬ 
occupation  data  is  not  duplicated  elsewhere.  Data  on  the  demographic 
characteristics  of  a  series  population,  such  as  age,  education  and  grade,  as 
well  as  on  population  size  and  race/sex  composition,  provide  a  historical 
snapshot  of  the  population  for  the  period  studied.  In  the  present  case  we 
have  a  picture  of  what  the  criminal  investigating  series  looked  like  on 
June  30  of  each  year  from  1977  to  1980.  Personnel  dynamics  data  on  accessions, 
losses  and  inter-occupational  movement,  complete  the  picture  by  presenting 
a  panorama  of  the  changes  in  the  population  during  the  period  of  study.  It  is 
important  to  realize,  however,  that  data  such  as  those  presented  here  are 
a  faithful  representation  of  the  actual  composition  and  history  of  the  pop¬ 
ulation  to  the  extent  that  the  data  bases  on  which  they  are  based  are  them¬ 
selves  accurate.  The  accuracy  of  the  present  data  bases,  namely  the  Current 
Status  File  and  the  Transaction  History  File  of  the  Central  Personnel  Data 
Kile,  are  dubious  for  some  purposes,  such  as  career  tracking  (Hirsh,  1981), 
but  appear  to  be  sufficiently  accurate  and  reliable  for  relatively  macro 
planning  purposes  such  as  budgeting  or  examination  planning. 

Knowledge  of  the  demographic  characteristics  of  an  occupation's  population 
as  well  as  of  fundamental  characteristics  of  population  change  is  a  necessary 
component  of  any  workforce  planning  effort.  It  is  also  critical  for  making 
informed  policy  decisions  in  other  personnel  related  areas.  This  is  true 
whether  one  wishes  to  maintain  or  change  current  trends  in  the  population 
of  the  occupation.  It  is  therefore  intended  that, in  the  future,  we  will 
continue  to  analyze  Federal-wide  single  occupation  trends,  and  attempt  to 
improve  the  ways  in  which  the  resultant  data  can  be  used  in  the  service  of 
human  resources  planning  and  decision  making. 
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ABSTRACT 


This  paper  describes  how  an  Executive  Steering  Group  (ESG)  built  a  "management  team" 
••consisting  of  members  from  three  organizations  primarily  responsible  for  the 
“evaluation  and  feedback  component  of  the  Strategic  Submarine  Force  SWS  Training 
^Program  described  in  Part  I  of  this  two-part  presentation.  PTEP  (Personnel  and 
Training  Evaluation  Program)  is  the  component  within  the  SWS  Training  Program  which 
has  a  broad  evaluation  and  feedback  charter. 

The  ESG  addresses  policies,  issues,  and  activities  as  they  relate  to  the  whole  PTEP 
evaluation  responsibility  within  the  training  system.  The  variety,  scope  and  com¬ 
plexity  of  PTEP  evaluation  repsonsibilities  that  must  be  carried  out  in  a  dynamic 
environment  require  close  coordination,  frequent  communication  and  periodic  assess¬ 
ments/status  reports  from  those  assigned  major  tasking. 

This  paper  discusses  the  dynamic  environment  and  the  factors  at  work  when  ESG  meet¬ 
ings  were  instituted,  describes  the  composition  of  the  group,  tells  how  actions  are 
assigned  and  tracked,  and  the  contribution  of  the  ESG  to  developing  and  maintaining 
a  functioning  "management  team"  in  support  of  the  SWS  Training  Program.  x 


INTRODUCTION 


The  Personnel  and  Training  Evaluation  T  rogran  (PTEP)  is  the  evaluation  component 
within  the  SWS  Training  Program  (described  in  Part  I).  By  providing  feedback  to 
all  of  the  other  training  system  components,  PTEP  makes  the  SU'S  Training  Program 
a  self-correcting  system.  PTEP  procedures,  structure,  organisation  anti  informa¬ 
tion  handling  systems  were  developed  over  a  number  of  years  and  are  constantly 
being  improved  to  meet  and  carry  out  the  evaluation  and  feedback  responsibilities. 
These  responsibilities  include  assessments  of  training  facilities,  hardware, 
documentation,  individual  courses  of  instruction  and  overall  knowledge  and  skill 
of  crew  members.  PTEP  collects  and  analyses  data  from  many  sources;  provides 
reports;  and,  when  appropriate,  makes  recommendations  to  those  responsible  for 
individual  crews,  schools,  instructors  and  curriculum  developers. 

An  effective  evaluation  and  feedback  system  is  essential  to  the  development 
and  maintenance  of  the  SWS  Training  Program-  PTEP  evaluation  is  a  continuous 
process  examining  various  aspects  of  the  training  system  as  data  bases  and  per¬ 
sonnel  change.  New  problems  arise.  New  training  courses  are  developed.  Current 
curricula  is  revised.  Instructors  and  instructional  staffs  are  rotated.  Charac¬ 
teristics  of  personnel  entering  the  training  pipeline  change.  New  classes  of 
SSBNs  are  designed  and  built.  Current  SSBNs  are  barkfitted  with  newer  systems, 
and  other  SSBNs  are  retired.  PTEP  has  a  very  important  charter  which  equates 
to  a  difficult  and  demanding  set  of  responsibilities  in  a  dynamic  and  ever 
changing  environment. 

The  evaluation  methodologies,  statistical  analyses  and  basic  testing  philosophies 
for  the  SWS  Training  Program  have  remained  relatively  constant.  However,  the 
changing  environment  and  the  multiple  requirements  lead  to  a  competition  for 
iimited  evaluation  resources.  And,  the  dynamics  of  this  evaluation  environ¬ 

ment  result  in  frequent  realignments  or  project  priorities.  Therefore,  managing 
the  PTEP  evaluation  efforts,  coordinating  separate  but  related  assessments,  and 
communicating  between  and  among  the  primary  responsible  organizations  and  agencies 
became  a  very  complex  and  demanding  responsibi xity  in  and  of  itself.  Hence,  the 
focus  of  this  paper — to  communicate  how  those  responsible  for  SWS  PTEP  have  made 
it  easier  to  do  a  better  job  while  keeping  responsible  parties  informed. 


BASIC  ENVIRONMENT  LEADING  TO  FORMATION  OF  AN  EXECUTIVE  STEERING  GP.OUP 


The  SWS  Training  Program  was  (end  is}  so  dynamic  that  control,  coordination  and 
communication  of  evaluations  in  this  environment  needed  immediate  and  increasing 
management  attention.  A  suggestion  to  form  an  executive  steering  group  was  put 
forth,  accepted  and  the  first  meeting  was  planned  in  January  1931.  An  agenda 
was  developed,  distributed,  and  expected  attendees  were  notified. 

There  are  three  primary  participants  engaged  in  the  day-to-day  operations  of  PTEP 
evaluation  projects  (The  U.S.  Navy  Strategic  System  Program  Office,  Central  Test 
Site  and  detachments,  and  contractor  personnel).  These  participants  have,  as  one 
might  expect,  different  chains  of  command.  They  are  geographically  dispersed. 
PTEP  evaluation  responsibilities  continued  to  encompass  personnel,  the  training 
system,  hardware,  management  documentat ion  and  facilities.  In  thin  environment 
one  should  not  have  been  surprised  to  find  that  routine  control,  coordination, 


.md  communication  methods  would  have  been  strained  to  the  limit.  They  were. 

And  in  fact,  there  was  a  growing  sense  of  frustration  and  even  some  doubt  as  to 
whether  all  of  the  evaluation  responsibilities  assigned  to  PTEP  could  be  managed 
effectively. 

it  should  be  pointed  out  that  the  SWS  Training  Program  takes  evaluation  very 
seriously,  is  committed  to  evaluation  as  an  intergral  part  of  the  system,  and 
relies  upon  a  wide  range  of  evaluations  to  help  make  management  decisions 
regarding  the  training  and  preparation  of  our  strategic  so"  marine  force.  For 
example,  during  a  five  year  period  between  1976  and  1980  the  SWS  Training  Program 
conducted  eighty-eight  studies  that  produced  428  recommendations  related  to  most 
components  of  the  training  program,  its  operation,  and  the  evaluation  thereof. 

It  is  important  for  one  to  realize  the  magnitude  of  the  training  program  to  under¬ 
stand  the  complexity  of  the  PTEP  evaluation  requirements.  As  of  July  1981  the  SWS 
Training  Program  had  a  total  of  180  courses.  Ninety-five  of  these  courses  had 
completed  all  five  developmental  phases  and  had  been  promulgated.  The  content 
and  conduct  of  these  courses  are  considered  stable.  The  eighty-five  remaining 
courses  were  in  various  stages  of  development.  Eight  were  in  Stage  1  (scheduling 
planning) ,  sixteen  in  Stage  3  (sample  curriculum) ,  forty-four  in  Stage  4  (pilot 
course),  and  sixteen  in  Stage  5  (Revisions,  final  course).  These  conditions  make 
the  actual  conduct  of  evaluation  and  particularly  the  interpretation  of  the  results 
e'en  more  difficult. 

Another  measure  of  the  magnitude  of  PTEP  evaluation  responsibilities  can  be  seen 
from  another  set  of  data.  Approximately  126,000  records  were  created  during  a 
recent  one-year  period.  During  this  same  time  2,856  groups  (average  of  7.3 
examinees  per  group)  were  tested — 20,849  examinees'  tests  were  scored,  data  recorded 
and  results  analyzed.  All  of  this  testing  generated  85,828  original  pages  of 
reports  with  177,432  additional  pages  reduced  and  photocopied.  The  primary  PTEP 
data  files  used,  at  that  time,  consisted  of  106.79  million  characters  of  data 
storage. 

PTEP  evaluation  responsibilities  include  personnel,  training  system,  hardware 
and  facilities.  The  testing  of  personnel  and  the  scoring  and  analysis  of  the 
data  represent  a  major  undertaking.  The  magnitude  of  this  evaluation  respon¬ 
sibility  is  characterized  in  many  ways.  New  tests  must  be  designed  and  devel¬ 
oped  to  replace  current  versions.  Attention  must  be  given  to  test  reliability 
and  validity  of  the  data  collected  as  curricula  change.  Another  aspect  of  the 
changing  character  of  this  environment  is  embodied  in  the  additions  to  SSBN's 
crews  and  the  losses  due  to  orders  ashore,  or  completion  of  an  enlistment  and 
leaving  the  Service.  Modernizing  equipment  adds  still  another  dimension  to  the 
evaluation  environment  as  older  classes  of  boats  are  retired,  newer  classes 
are  backfitted  and  still  new  ones  built. 

This  is  the  real  world  of  PTEP  evaluation.  It  is  dynamic,  a  constant  challenge, 
and  one  that  requires  continuing  diligence  to  all  aspects  of  the  process.  One 
cannot  forget  the  need  for  rigorous  curriculum  development;  the  classroom  and 
laboratory  exercises;  the  test  design,  development  and  administration;  the 
analysts  of  the  large  quantities  of  data;  and  the  presentation  and  Interpretation 
of  the  results  to  those  who  need  to  /now. 
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A  few  months  ago  it  became  evident  that  the  routine  methods  of  control,  coordi¬ 
nation,  and  communication  within  the  PTEP  component  of  the  SWS  Training  Program 
were  not  functioning  effectively.  The  initiation  of  an  Executive  Steering  Group 
was  an  attempt  to  gain  more  control  and  more  effectively  manage  the  variety  of 
PTEP  evaluation  projects.  We  wanted  to  deal  with  the  more  important  issues 
and  get  out  of  the  business  of  putting  out  brush  fires. 


INITIATION  OF  THE  EXECUTIVE  STEERING  CROUP 


In  this  dynamic  pressure-packed  environment  we  convened  our  first  ESG  in  January 
1981.  Again  from  hindsight,  it  is  not  surprising  to  report  that  much  of  the  first 
meeting  was  devoted  to  communicating  between  and  among  ourselves  as  to  the  current 
status  of  projects,  perceptions,  and  problems  facing  us.  Several  action  items  were 
identified  and  responsibility  for  them  was  assigned  during  this  one-day  meeting. 

The  group  agreed  that  the  first  meeting  had  been  worthwhile,  and  that  we  should 
meet  about  once  a  month  for  the  next  few  months.  Subsequently,  minutes  of  the 
meeting  were  prepared  and  distributed  to  the  attendees. 

Later,  an  agenda  was  developed  and  distributed  for  the  February  meeting.  Again 
the  group  received/gave  status  repoits,  adjusted  schedules  for  projects,  and 
determined  action  items.  At  this  meeting  the  group  also  heard  and  considered 
some  very  serious  internal  criticism.  The  ESG  found  that  by  providing  an 
opportunity  to  raise  some  very  frustrating  and  potentially  counter  productive 
issues  and  perceptions  and  by  dealing  with  them  in  a  constructive  manner,  we 
were  maturing  as  a  group.  The  response  of  the  ESG  (in  session) ,  and  actions  by 
individuals  as  a  result  of  the  discussions  constituted  a  significant  step  in  the 
development  of  a  genuine  "team"  spirit. 

Development  of  an  agenda,  subsequent  preparation  of  minutes,  providing  status 
reports,  and  realigning  of  priorities  and  schedules  became  routine.  ESG  meet¬ 
ings  were  conducted  in  January,  February,  March  and  April. 

The  next  major  milestone  in  the  development  of  the  ESG  "management  team"  involved 
the  use  of  a  computer  to  keep  track  of  the  growing  number  of  action  items.  During 
May  we  developed  a  relatively  simple  automated  system  for  tracking  our  action  items 
Each  action  item  was  (and  is)  assigned  a  six-digit  file  number.  The  first  two 
digits  equal  the  last  two  digits  of  the  calendar  year,  the  second  two  digits 
equal  the  month  of  the  ESG  meeting,  and  the  last  two  digits  equal  the  item  number 
used  to  record  the  essence  of  topics  discussed  in  the  ESG  meetings. 

This  relatively  simple  file-number  system  permits  one  to  rather  quickly  refer 
to  the  appropriate  minutes  for  more  details,  when  desired.  The  automated 
action  item  file  contains,  in  addition  to  the  file  number,  a  description  of 
the  action  item  (200  character  limit),  due  date,  complete  date,  primary  respon¬ 
sibility  code,  support  responsibility  code,  and  a  remarks  field  (200  character 
limit).  Listings  of  action  items  are  provided  ESG  members  at  various  times  for 
use  in  managing  items  assigned  and  to  report  completion  or  a  change  in  the  status. 

Prior  to  the  June  ESG  meeting  each  member  was  furnished  with  a  complete  listing  of 
all  action  items  (62)  since  the  January  meeting.  At  the  June  meeting  the  purpose 
and  uses  of  the  action  item  listings  were  discussed.  Attendees  were  asked 
to  provide  an  update  for  each  item  for  which  they  had  primary  responsibility. 
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Following  the  June  meeting  a  new  listing  was  provided  for  each  primary  respon¬ 
sibility  code  and  attached  to  the  ESG  minutes.  This  new  listing  included  the 
action  items  from  the  June  meeting. 

Another  milestone  in  the  development  of  the  "management  team"  was  reached  in  July 
when  the  ESG  adopted  a  new  format  for  conducting  subsequent  meetings.  Essentially, 
the  new  format  simplified  the  ESG  agenda.  Meetings  will  be  conducted  by  first 
reviewing  and  approving  the  previous  minutes  with  any  corrections.  Second,  the 
ESG  will  review  the  open  action  items  from  previous  meetings.  And  lastly,  new 
topics  and  issues  will  be  presented  and  discussed.  The  new  format  simplifies  the 
agenda,  focuses  specific  attention  on  current  status  of  assignments  and  makes  it 
easier  for  participants  to  recognize  new  business, 

A  complete  set  of  ESG  minutes  and  an  action  item  listing  were  a  great  source  of 
information  for  a  recent  ESG  member  who  joined  the  group  for  the  first  time  in 
July  1981.  These  two  sets  of  documents  summarized  for  him  what  had  been  accom¬ 
plished  and  discussed  over  the  past  few  months,  the  time  sequence  in  which 
de^.  sious  had  been  made,  and  the  relationships  of  certain  action  items.  Some 
items  are  cross-referenced  by  file-numbers  in  the  remarks  data  field.  As  other 
ESG  participants  join  this  group,  we  will  see  even  more  uses  for  the  effort 
invested  in  resolving  important  issues  as  a  group,  recording  the  results  of  these 
actions,  in  tracking  and  maintaining  control  over  the  one-time  and  continuing 
evaluation  activities. 

Another  milestone  in  the  development  of  the  "management  team"  was  reached  in  the 
August  meeting.  This  was  the  first  time  that  representatives  from  Central  Test 
Site  detachments  had  attended  these  meetings.  The  detachments  are  very  much  in 
the  front  line  of  PTEP  test  administration  and  of  feeding  back  the  results  to 
the  SSBNs .  Very  often  they  (detachments)  are  the  first  to  sense  or  hear  of  a 
problem,  and  t  is  important  that  they  understand  why  certain  projects  are  being 
undertaken  and  how  they  are  progressing.  Additionally,  a  representative  from 
Chief,  Navy  Technical  Training  (CNTT) ,  also  sat  in  on  the  meeting.  The  Central 
Test  Site  is  under  the  command  of  CNTT.  Both  the  detachments  and  CNTT  representa¬ 
tives  had  an  opportunity  to  see  how  the  ESG  operates  and  the  full  range  of  PTEP 
evaluation  responsibilities  being  controlled  and  coordinated  by  this  "management 
team."  The  openness  of  the  ESG  to  input  from  and  observation  by  others  is 
another  indication  that  the  group  has  experienced  some  growth. 

The  ESG  has  found  a  balance  between  managing  the  evaluation  process  and  in 
utilizing  the  technical  expertise  necessary  for  the  development  of  evaluation 
methodologies,  implementation  of  these  plans,  and  in  the  conduct  of  evaluations, 
per  se.  Technical  expertise  abounds  within  the  resources  available  to  the  ESG. 
There  are  subject  matter  experts  familiar  with  the  complete  range  of  curricula, 
hardware  and  documentation. 

Other  members  are  well  trained  in  statistical  analysis,  psychometrics,  educational 
research,  test  design  and  construction,  information  systems,  and  management 
science  (including  organization  development).  Individual  commitment  to  PTEP 
evaluation  by  this  very  diverse  group  also  made  a  contribution  to  the  limited 
success  of  the  ESG.  Some  members  contribute  more  from  a  technical  point  of  view 
while  others  are  more  attuned  to  the  process  by  which  we  collectively  seek  to 
achieve  long  term  dividends  from  all  PTEP  evaluation  efforts. 


CONCLUSIONS  AND  SUMMARY 


The  ESG  came  Into  being  (was  born)  out  of  a  genuine  need.  The  environment  was  ripe 
for  something  which  would  provide  more  control,  integrate  and  coordinate  all  of 
the  evaluation  efforts,  and  at  the  same  time  improve  communications.  It  took 
a  great  deal  of  patience,  attention  to  detail,  emphasis  upon  preparation  and 
following  up  of  meetings,  and  a  diligence  toward  the  organizational  development 
process.  In  the  beginning,  some  of  the  ESG  members  were  more  open  and  trusting 
than  others.  Each  individual  has  a  separate  set  of  perspectives  and  brings 
them  to  the  ESG.  And  we  found  value  in  and  some  truth  to  each  of  these  perspec¬ 
tives.  In  our  attempt  to  gain  better  control,  move  the  PTEP  evaluations  forward, 
and  to  grow  as  professionals  we  were  forced  to  learn  from  each  other  and  about 
ourselves.  We  recognized  anew  that  often  growth  does  not  come  easy.  It  is 
sometimes  hard  to  listen,  and  even  more  difficult  to  really  hear  that  which 
one  had  rather  not  be  told.  However,  we  have  listened  and  have  learned  from 
each  other.  The  author  feels  that  the  members  of  the  ESG  experienced  a  great 
deal  of  growth  during  the  past  several  months  as  we  dealt  with  evaluation  issues 
from  many  different  perceptions. 

As  a  group,  we  undertook  several  new  tasks  designed  to  improve  the  overall 
performance  of  PTEP.  These  actions  were  carried  out  in  the  dynamic  environment 
described  earlier.  Schedules  were  adjusted,  priorities  were  realigned,  and 
resources  were  reallocated  in  light  of  new  information.  The  ESG  did  not  try 
to  resolve  all  difficulties,  but  did  try  to  deal  with  differences  in  a  mature 
and  understanding  manner.  We  attempted  to  focus  upon  tasks  to  be  accomplished, 
while  growing  in  cur  capacity  to  accept  the  fact  that  we  could  disagree  with¬ 
out  being  altogether  disagreeable.  We  have  achieved  a  measure  of  success  on 
both  counts.  A  review  of  the  major  milestones  confirms  this  fact.  Also,  the 
number  of  action  items  assigned  (100)  and  completed  (82)  during  this  period  also 
attests  to  the  fact  that  we  had  gained  better  control  of  evaluation  projects 
than  one  would  normally  expect  with  conventional  management  methods.  The  under¬ 
lying  commitment  of  individual  ESG  members  to  the  procesr  of  evaluation  as 
embodied  in  PTEP  for  the  SWS  Training  Program  helped  us  band  together  and  to 
develop  this  "management  team." 

The  importance  of  the  ESG  in  strengthening  PTEP  evaluation  efforts  resulted  from 
a  lot  of  hard  work.  The  results  described  above  did  not  suddenly  and  miraculously 
burst  forth  in  response  to  our  real  need.  Much  of  the  work  went  on  between  the 
meetings  themselves  that  was  supportive  in  nature,  that  took  care  of  important 
details,  and  was  sensitive  to  good  management  principles  and  organization  development 
The  author  does  not  intend  to  suggest  that  ESGs  are  "the  answer"  to  other  evaluation 
efforts,  but  rather  that  this  process  has  served  our  group  well. 

In  this  context  a  few  suggestions  are  offered  to  those  who  might  want  to  use 
the  basics  of  this  concept  in  another  environment.  These  suggestions  are  grouped 
into  three  categories.  First,  keep  things  as  simple  and  straightforward  as 
possible.  Make  good  use  of  time  by  developing  and  distributing  agendas  before 
the  meetings.  Then  conduct  the  meetings  as  "working  sessions"  in  accordance  with 
the  agenda.  Keep  the  team  small  and  productive  —  not  a  convention.  Subsequent 
to  each  meeting  prepare  and  distribute  brief  minutes  that  are  reviewed,  corrected 
and  approved  at  the  next  meeting.  Second,  the  author  would  suggest  that  leader¬ 
ship  of  the  group  should  recognize,  accept  and  be  able  to  deal  with  different 


perspectives.  The  author  means  by  this  statement  that  an  opportunity  should  be 
given  for  different  points  of  view  to  be  expressed  before  important  decisions  are 
made  that  will  have  an  impact  upon  the  group.  Likewise,  it  is  important  for  this 
leadership  to  follow  up  on  differences  that  exist  between  and  among  members  so 
that  they  can  continue  to  express  differences  without  being  altogether  disagree¬ 
able.  The  third  suggestion  is  that  an  audit  trail  or  record  of  group  decisions 
be  provided.  The  minutes  from  the  meetings  may  serve  this  purpose  completely. 

If  they  do  not,  then  other  measures  should  be  taken.  For  example,  when  the 
total  number  of  actions  is  quite  large,  membership  in  the  group  changes,  and 
there  is  a  need  to  retrieve  selected  material  from  time  to  time,  an  automated 
system  may  be  required  in  addition  to  the  minutes.  The  author  also  considers 
the  audit  trail  an  impurtant  tool  or  device  for  orienting  new  team  members  as 
well  as  those  who  have  an  occasional  need  to  know  what  is  going  on. 

Evaluations  are  seldom  conducted  where  all  of  the  variables  are  well  controlled. 
In  PTEP,  only  a  few  of  the  variables  are  controlled.  Hence,  the  analysis  and 
interpretation  of  PTEP  data  are  difficult,  but  not  impossible.  The  key  is  in 
creatively  using  what  is  available  and  in  managing  the  process  well.  The  author 
Is  pleased  with  the  role  that  the  ESG  has  played  in  PTEP  evaluation  and  antici¬ 
pates  that  this  "management  team"  will  continue  to  be  an  important  means  by 
which  we  maintain  control,  provide  appropriate  levels  of  coordination,  and 
supplement  communications  between  and  among  the  key  participants  in  the  dynamic 
SWS  Training  Program. 
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This  study  investigated  the  effect  of—  expectations  -onsttU-tlon 
in  Canadian  Forces  (CF)  recruit  training.  The^820  male  participants t. 
-included  699  who  successfully  completed  recruit  training  (stayers) 
and  121  who  failed  or  withdrew  (leavers) .  Participants  completed  a 
pre-test  questionnaire  within  two  days  of  arrival  at  recruit  school 
and  a  post-test  questionnaire  after  three  weeks  of  training. 
Expectations  of  both  groups  were  measured  against  a  12  member 
criterion  group,  comprised  of  senior  recruit  instructors,  recruiting 
officers  and  personnel  selection  officers.  Findings  indicate  that 
expectations  of  stayers  and  leavers  were  not  significantly  different 
nor  did  leavers  report  a  greater  degree  of  disconrirmed  expectations. 
Leavers  did,  however,  have  a  greater  ratio  of  expectations  discontinued 
in  a  " worst  than  expected”  direction.  A  relationship  was  found 
between  attrition  and  expectations  of  success  in  recruit  training, 
membership  in  the  CF  after  recruit  taining  and  intended  length  of 
service,  with  leavers  responding  more  negatively  in  all  three  areas. 
Although  assessment  of  pre-training  information  did  not  differ  for 
the  two  groups,  the  responses  of  both  groups  shifted  in  a  negative 
direction  on  the  post-test.  Implications  for  the  CF,  of  the  findings, 
are  discussed.  \ 
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As  Inflation  continues  to  escalate,  most  segments  of  western  society 
are  experiencing  increased  financial  pressures.  This  is  particularly  so 
in  military  organizations,  where  it  is  becoming  increasingly  difficult 
to  justify  huge  monetary  expenditures  to  adequately  train  and  equip 
military  forces.  The  Canadian  Forces  (CF)  has,  in  recent  years,  been 
closely  scrutinized  by  both  government  and  citizens  in  regard  to  monetary 
expenditures,  and  it  has  become  imperative  that  every  dollar  of  the 
military  budget  is  spent  wisely  and  effectively.  This  close  monetary 
scrutiny,  while  affecting  all  aspects  of  the  CF,  has  been  particularly 
focused  on  the  recruiting  and  selection  processes  utilized  to  maintain 
necessary  manpower  levels.  Recruitment,  enrolment  and  retention  of 
suitable  personnel  has  become  an  increasingly  critical  issue. 

In  addition  to  imposed  financial  constraints,  recent  research 
(Cotton,  1974)  indicates  that  the  pool  of  potential  recruits  for  the  CF 
will  begin  to  shrink  in  the  early  1980s.  Concurrently,  competition  with 
industry  for  the  available  manpower  pool  is  increasing  steadily.  As  well, 
the  CF  will  require  a  greater  proportion  of  high  quality  personnel  to 
cope  with  advancing  military  technology.  Financial  constraints, 
increased  competition  for  recruits  and  the  need  for  better  recruits, 
demand  that  recruiting  becomes  increasingly  cost  effective,  with  attrition 
at  the  recruit  training  level  becoming  a  critical  problem. 

It  is  doubtful  that  attrition  in  recruit  training  is  precipitated 
by  a  single  definable  factor,  but  more  likely  results  from  a  number 
of  interacting  factors.  The  factor  of  expectations  is  considered 
important  in  explaining  attrition  from  any  organization.  Extensive 
research  has  supported  the  assumption  that  prospective  organization 
members  bring  with  them  a  complex  expectation  set  of  their  role  and 
membership  in  their  chosen  organization.  Wiskoff  (1977;  Page  V.) 
observed  that  "expectations  have  been  shown  to  exercise  a  pervasive 
influence  on  decision  processes  of  individuals  to  join,  develop  occupa¬ 
tional  choices,  remain,  or  derive  satisfaction  from  the  military."  That 
the  expectations  brought  by  prospective  organization  members  are  often 
realistic  has  also  been  well  documented  by  previous  research.  This 
phenomenon  may  be  particularly  true  for  new  members  of  the  military  who 
encounter  an  environment  that  is  alien  previous  experiences. 

In  considering  the  relationship  between  expectations  and  attrition, 
the  degree,  area  and  direction  of  disconfirmation  of  expectations  are 
of  primary  interest.  Previous  research  indicates  it  is  not  necessarily 
the  degree  of  disconfirmation  that  contributes  to  attrition,  but  the 
direction  or  area  of  disconfirmation,  or  a  combination  of  both  direction 
and  area.  This  study  investigated  the  interrelationships  of  degree, 
direction  and  area  of  disconfirmation  of  expectations  and  their  relation¬ 
ship  with  attrition  in  CF  recruit  training.  Although  previous  CF 
studies  have  acknowledged  that  expectations  are  a  major  contributing 
factor  to  attrition  during  recruit  training  [eg.,  Cotton  (1974,  1975); 
Fournier  and  Keates  (1975);  Rampton  (1973);  Mullin  (1977)],  there  has  not 
been  a  sustained  focus  on  the  extent  to  which  expectations  effect 
attrition.  The  implications  of  research  in  this  area  could  be  signifi¬ 
cant  for  the  CF,  with  the  possibility  of  reduced  attrition  during  recruit 
training,  a  concomitant  reduction  in  the  unsettling  effect  leavers  have 
on  other  recruits  and,  reduced  recruiting  quotas  resulting  from  reduced 
attrition. 
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In  view  of  continuing  financial  constraints,  a  shrinking  recruit 
pool,  increased  competition  with  industry  for  recruits  and  high 
attrition  rates  during  CF  recruit  training  (averaging  20%  between 
1974-1979),  it  is  timely  to  examine  the  effect  of  expectations  on 
attrition  during  CF  recruit  training.  This  study  attempts  to  answer 
a  number  of  critical  questions  regarding  this  phenomenon.  To  begin 
with,  are  the  expectations  of  recruit  training  held  by  leavers  less 
realistic  than  those  of  stayers?  Do  leavers  report  a  greater  degree 
of  disconfirmed  expectations?  If  so,  are  more  leavers'  expectations 
discontinued  in  critical  areas  and/or  in  a  worse-than-expected  direction? 
Do  leavers  have  a  less  positive  expectation  set  toward  membership  in 
the  CF  after  recruit  training?  Is  the  expectation  of  success  in 
recruit  training  less  for  leavers?  Is  the  intended  period  of  service 
less  for  leavers?  Do  leavers  classify  pre-enrolment  Information  as 
being  less  accurate  and  sufficient  than  do  stayers? 

Previous  Research 


The  various  theoretical  models  developed  in  recent  years  to  explain 
the  Influence  of  expectations  in  human  behavior  offer  Interesting 
implications  for  prospective  or  new  organization  menbers.  Current 
theory  suggests  a  strong  relationship  between  expected  and  actual 
outcomes,  and  their  attractiveness  to  the  individual,  and  the  energy 
expended  toward  becoming  a  productive  organization  member.  Research 
supports  the  contention  that  the  act  of  joining  a  group  or  organization 
is  carried  out  the  for  purpose(s)  of  goal  and  need  satisfaction.  Upon 
committing  himself  to  an  organization,  the  Individual  has,  as  much  as 
possible,  become  convinced  through  assimilated  information  from  various 
sources,  that  expected  goals  and  needs  can  be  attained,  Gauntner  (1973), 
Owens  (1970),  Weston  (1974)  and  Youngberg  (1963)  all  alluded  to  the  strong 
relationship  between  pre-enlistment  expectations  and  satisfaction  with 
the  military  of  new  members.  Rampton  (1973)  commented  that  if  an 
organization  hopes  to  "attract  or  keep  sufficient  personnel"  then  it 
must  be  perceived  "to  offer  an  environment  that  meets  the  Individual's 
expectations  and  needs  as  well  or  better  than  other  available  options" 

(p.  28).  The  pervasive  influence  of  expectations  In  determining  job 
motivation,  performance,  satisfaction  and  turnover  has  been  well 
documented  by  research.  Macedonia  (1969),  Owens  (1970),  Mullln  (1977), 
Wiskoff  (1977),  Bourne  (1965),  Griffith  et  al  (1979),  Ilgen  (1975), 

Scott  (1972)  and  Youngberg  (1963),  all  commented  on  the  potential  for 
dissatisfaction  and  eventual  attrition  if  expectations  of  what  will 
happen  in  a  stressful  environment,  such  as  recruit  training,  are  unreal¬ 
istic  and  therefore  unmet. 

For  the  military,  the  group  most  strongly  identified  with  the  question 
of  realistic  and/or  unrealistic  expectations  of  new  members  are  the 
recruiters.  Expectancy  theorists  have  been  less  than  kind  to  military 
recruiters,  openly  questioning  their  integrity  and  credibility.  As  the 
interface  between  the  military  and  the  potential  enrollee,  the  recruiter 
faces  a  difficult  challenge  in  presenting  .me  to  the  other.  This  is 
particularly  true  in  the  process  of  developing,  in  the  applicant,  a 


realistic  expectation  set  of  recruit  training,  which  inevitably  includes 
modifying  unrealistic  expectations.  In  fairness  to  recruiters,  the 
majority  probably  do  it  as  best  they  can;  some  may  not  appreciate  the 
critical  role  they  play  in  shaping  short  and  long  term  attitudes  of 
military  recruits. 

Basically,  the  recruiter's  role  is  to  close  the  expectation  gap 
between  the  recruit  and  the  organization.  In  recent  years,  extensive 
research  has  been  conducted  in  seeking  effective  methods  of  closing  the 
expectation  gap.  Much  of  it  has  focused  on  the  utility  of  providing 
realistic  and  accurate  information  to  new  organization  members.  Weitz's 
(1956)  study  was  a  landmark  in  this  area  when  he  demonstrated  that 
provision  of  realistic  information  resulted  in  reduced  attrition  and  an 
increased  tendency  to  join  an  organization.  Macedonia  (1969)  produced 
similar  results  in  a  study  of  entering  USMA  cadets.  He  found  that 
provision  of  realistic  information  ferreted  out  impulsive  applicants, 
that  enrollees  accepted  greater  personal  responsibility  for  their  decision 
to  join  and  that  enrollees  were  better  prepared  for  the  situational  stresses 
with  which  they  were  confronted.  Studies  by  Farr  et  al  (1972),  Bittinger 
(1964),  Wanous  (1973)  and  Sherman  (1959),  produced  similar  results. 

Scott  (1972)  emphasized  the  necessity  for  recruiters  to  realize  that 
selection  is  a  mutual  process  involving  both  the  applicant  and  the 
organization.  Scott  believed  that  selection  is  too  frequently  approached 
from  the  organization  perspective  only,  with  recruiters  aathering  reams 
of  information  about  the  applicant  but  not  reciprocating  with  necessary 
information  about  the  organization.  Wanous  (1973)  drew  conclusions 
similar  to  Scott's,  stating  that  industrial  psychologists  are  pre¬ 
occupied  with  the  organization's  viewpoint,  at  the  applicant's  expense. 

The  1973  study  by  Wanous  utilized  the  concept  of  "realistic  job  previews", 
which  resulted  in  more  realistic  job  expectations  and  no  difference  in 
job  acceptance  rates  between  control  and  experimental  groups. 

Previous  expectancy  research  has  also  focused  on  the  problem  of 
identifying  potential  leavers  before  or  shortly  after  enrolment.  Various 
theories  have  been  postulated  to  resolve  this  issue.  Ilgen  (1975), 

Schuckit  and  Herman  (1978)  and  Mull  in  (1977)  believed  that  the  impulsive 
individual,  who  enrolled  without  giving  serious  consideration  to 
implications  of  joining  the  military,  quite  often  fails  or  quits  during 
recruit  training.  Expectations  of  success  in  early  training  ar.d 
employment  (eg.  Mobley  et  al,  1977),  level  of  formal  education  (eg. 

Lockman,  1976  and  Cotton,  1974),  category  of  service  (O' Gorman,  1972)  and 
age  (Guinn,  1977),  have  also  been  put  forth  as  Indicators  of  potential 
attrition 

Barrett  (1975)  in  Wiskoff  (1977)  most  succinctly  summed  up  the 
large  picture  of  what  the  far-sighted  goal  of  all  expectation  theorists 
should  be  when  he  postulated  that: 

.  .  .  there  is  an  optimal  match  or  congruence  among 
abilities,  preferred  attributes,  expectancies  and 
task  complexities  which  will  result  in  maximization 
of  resources  in  terms  of  individual  productivity, 
work  satisfaction,  and  organization  tenure  (p.  20). 
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Method 


Subjects:  The  study  conducted  in  early  1979,  consisted  of  820  male 
recruits  who  underwent  training  at  the  Canadian  forces  Recruit  School. 

Of  the  820  participants,  699  successfully  completed  recruit  training 
(stayers),  while  121  failed,  or  were  released  at  their  request 
(leavers).  The  heterogeneous  sample  included  participants  from 
every  province  and  territory  in  Canada,  a  wide  variety  of  socio¬ 
economic  backgrounds,  and  participants  ranged  In  age  from  17  to  25  years. 

Procedure:  The  study  errployed  a  six-part  questionnaire  for  data 
collection.  In  Part  I,  demographic  data  were  collected.  Part  II 
contained  55  items  on  which  the  respondent  indicated  how  much  of  each 
item  he  expected  to  experience  in  recruit  training.  Parts  III  and  IV 
contained  the  same  items  and  format  as  Part  I,  with  respondents  being 
requested  to  indicate  preferred  amount  of  each  item  and  importance  of 
finding  preferred  amount  respectively.  Part  V  consisted  of  three  items 
pertaining  to  (a)  expected  success  in  recruit  training,  (b)  length  of 
intended  service  in  the  CF  ar.d,  (c)  assessment  of  accuracy  of  pre¬ 
training  information  provided  by  recruiters.  Part  VI  was  a  survey  of 
post-CFRS  expectations  of  membership  in  the  CF.  For  this  study,  only 
Parts  II,  V  and  VI  were  analyzed.  An  evaluation  of  the  questionnaire 
with  a  sample  of  66  recruits  led  to  minor  modifications  in  format  and 
clarification  of  some  Instructions.  Item  analyses  of  the  questionnaire 
were  conducted,  and  an  alpha  reliability  coefficient  of  0.94  was 
obtained.  The  instrument  was  designed  by  the  author,  with  items  being 
chosen  on  the  basis  of:  recruit  information  films,  recruiting  brochures, 
consultation  with  CFRS  staff  and  experienced  recruiters  and  the  author’s 
own  experience  as  a  Personnel  Selection  Officer  (PSO)  in  both  recruiting 
and  on  operational  military  bases. 

A  pre-test  questionnaire  was  administered  during  the  recruits' 
first  two  days  at  the  CFRS.  The  post-test  was  administered  at  the 
completion  of  three  weeks  of  recruit  training.  Questionnaires  were 
administered  by  the  PSO  staff  at  the  CFRS  to  groups  of  participants 
averaging  83  in  number  with  respondents  being  assured  anonymity.  The 
procedure  resulted  in  820  usable  questionnaires. 

The  55  expectation  items  were  also  administered  to  a  12-member 
criterion  group  which  consisted  of  senior  instructors  and  the  PSO's  from 
the  CFRS,  recruiting  officers  and  the  author.  Arithmetic  means  of  their 
responses  to  each  item  were  established  as  an  accurate  measure  of  what 
recruits  could  realistically  expect,  in  regard  to  the  55  expectation 
items,  at  the  CFRS. 


Results 


Discriminant  analysis  procedures  were  conducted  to  compare  responses 
of  the  two  groups  on  both  the  pre  and  post  tests.  T-tests  were  also 
conducted  using  a  p<.05  level  of  significance. 
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Expectations  of  leavers  were  not  dissimilar  from  those  of  the 
stayers,  in  comparison  to  the  criterion  group.  Leavers  had  15  of  the 
55  expectations  items  significantly  different  from  the  criterion  group 
while  stayers  had  16.  Of  the  15  items  on  which  the  leavers  differed 
from  the  criterion  group,  13  were  the  same  as  the  items  on  which  the 
stayers  differed  from  the  criterion  group. 

Pis  confirmation:  Leavers  did  not  report  a  greater  degree  of 
disconfirmed  expectations  than  did  stayers.  Leavers  did  have  a  greater 
ratio  of  expectations  disconfirmed  in  a  "worse  than  expected"  direction, 
while  disconfirmation  for  stayers  was  almost  the  same  in  both  directions 
Table  1  clarifies  these  results. 

Table  1 

Disconfirmed  Expectations  for  Stayers  and  Leavers 
with  Percentage  and  Direction  of  Disconfirmation 


No.  of 

expectations 

disconfirmed 

Percentage 

Di  sconfi  rmed 

"better 

than 

expected" 

direction 

Of 

k 

better 

"worse 

than 

expected" 
di rection 

% 

worse 

Stayers: 

44 

86.275 

32 

47.720 

23 

52.273 

Leavers : 

30 

58.824 

9 

30.000 

21 

70.000 

Cri  ti  cal 

Dimensions: 

Eleven  critical 

dimensions  were  identified 

leavers  did  not  report  more  disconfirmed  expectations  in  critical  areas 
than  did  stayers.  Leavers  did,  however,  have  mere  critical  dimensions 
disconfirmed  in  a  worse  than  expected  direction,  than  did  stayers. 


Post-CFRS  Attitudes:  Leavers  were  found  to  have  a  less  positive 
expectation  set  about  membership  in  the  CF  after  recruit  training.  Table 
2  illustrates  these  results:  on  the  pre-test,  stayers  were  less  positive 
than  leavers  on  seven  items,  while  leavers  were  less  positive  on  13 
items.  On  the  post- test,  stayers  were  less  positive  on  three  items  while 
leavers  were  less  positive  on  17  items.  This  reveals  a  shift  In  Post- 
CFRS  attitudes  for  both  the  stayers  (more  positive)  and  leavers  (less 
positive)  from  the  pre-  to  post-test. 


Table  2 


Post  CFRS  Expectation  Sets  for 
Stayers  and  Leavers  -  Pre  and  Post  Test 


Mo.  of  items 
less  positive 
(pre-test) 

%  Items  less 
posi tlve 
(pre-test) 

No.  of  items 
less  positive 
(post- test) 

2  items  less 
posi  tl  ve 
(post- test) 

Stayers : 

07 

35.0 

03 

15.0 

Leavers : 

13 

65.0 

17 

85.0 

Expectations  of  Success:  Leavers'  expectations  of  success  in  recruit 
training  were  less  positive  than  those  of  stayers,  on  both  the  pre  and 
post-test.  In  fact,  the  negative  shift  by  the  leavers  from  the  pre 
to  post- test  was  also  significant  (p<.05).  The  leavers,  therefore  gave 
themselves  even  less  chance  of  successfully  conpleting  recruit  training, 
after  three  weeks  at  the  CFRS.  Table  3  demonstrates  these  results. 

Table  3 

Stayers  and  Leavers  Expectations  of 
Success  in  Recruit  Training  -  Pre  and  Post  Test 


Stayers 

Leavers 

Significance 
of  difference 

Mean  response 

pre-test 

*4.9941 

4.5576 

.001 

Mean  response 

post- test 

4.9842 

4.0781 

.001 

*Range  of  response:  )  through  6. 


Intended  Length  of  Service:  Leavers  intended  length  of  service  was 
less  than  that  of  stayers.  Stayers  Intended  to  serve  longer,  on  both 
the  pre  and  post  tests,  with  their  intention. 
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increasing  from  the  pre  to  post  test,  albeit  not  significantly.  Leavers 
intended  length  of  service  decreased  from  the  pre  to  post  test  (p(.05). 
Table  4  elaborates  on  the  results. 

Table  4 

Stayers  and  Leavers  Intended 
Length  of  Service  -  Pre  and  Post  Test 


Stayers 

Leavers 

Significance 
of  difference 

Mean  response 

pre-test 

*7.1073 

6.6552 

.006 

Mean  response 

post- test 

7.1848 

6.0500 

.001 

*ftange'”bf  response:  T  through  67 


Assessment  of  Pre-Training  Information:  No  difference  was  found 
between  stayers  and  leavers  in  assessment  of  pre-training  information. 
What  is  particularly  interesting  about  this  hypothesis,  however,  is 
the  dramatic  difference  (p£05)  between  pre  and  post  test  responses 
for  both  the  stayers  and  leavers.  On  the  pre-test  71.45?  of  stayers 
and  67.36%  of  the  leavers  assessed  pre-training  information  as  being 
accurate  and  sufficient.  On  the  post  test  only  37.61?  of  the  stayers 
and  32.26%  of  the  leavers  assessed  pre-training  information  as  being 
adequate  and  sufficient.  Table  5  illustrates  these  results. 
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Table  5 


Stayers  and  Leavers  Assessment 
of  Pre-Training  Information  - 
Pre  and  Post-Test 


Stayers 

Leavers 

Significance 
of  Difference 

Mean  response 

pre-test 

.7145 

.6786 

.468 

Mean  response 

post- test 

.3761 

.3226 

.400 

Discussion 


Findings  regarding  expectations  of  CF  recruit  training  were  not 
consistent  with  other  recent  studies  indicating  a  difference  In  expecta¬ 
tions  between  stayers  and  leavers.  The  fact  that  this  study  did  not 
reveal  a  difference  In  expectations  of  recruit  training  between  stayers 
and  leavers  suggests  that  accuracy  of  expectations,  In  general,  may  not 
be  a  decisive  factor  In  influencing  attrition  in  CF  recruit  training. 

In  this  study,  degree  of  disconfirmatlon  of  expectations  did  not 
differ  significantly  for  stayers  and  leavers.  These  findings  also 
differ  from  recent  research  and  suggest  that  degree  of  disconfirmatlon 
is  not  a  decisive  factor  In  attrition  of  CF  recruits.  Owens  (1970), 
reached  the  same  conclusion,  based  partly  on  the  assumption,  also 
documented  by  other  studies,  that  new  organization  menfcers  (particularly 
military)  lack  previous  experience  on  which  to  draw  in  developing  a 
base  of  realistic  expectations.  Adjustment  to  disconfirmatlon  of 
expectations,  therefore,  appears  to  be  a  salient  factor  In  explaining 
attrition  during  recruit  training,  with  previous  studies  [Wiskoff  (1977) 
Wadsworth  (1974);  Haamond  (1966)]  indicating  that  adjustment  was  usually 
in  a  downward  direction. 

Direction  of  disconfirmatlon  also  appears  to  be  a  differentiating 
factor  in  attrition,  with  leavers  experiencing  more  dlsconflrmed 
expectations  in  a  “worse  than  expected"  direction.  In  particular, 
disconfirmatlon  in  a  negative  direction  in  critical  areas  (eg:  specifi¬ 
cally  Identified  variables)  may  be  decisive  factors  in  influencing 
attrition. 
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In  a  1977  st-  dy,  Mobley,  Hand  and  Logan  attributed  the  differences 
between  stayers  and  leavers  ability  to  adjust  to  disconfirmation  as 
being  due  to  differences  in  "certain  personality  trait  dimensions" 
such  as  confidence,  anxiety,  self-image,  etc.  These  perceived  traits 
negatively  affected  some  individuals'  ability  to  cope  with  the  anticipa¬ 
tion  and  reality  of  heavy  stress  (leavers),  while  others  were  more 
flexible  in  adjusting  expectations  to  reality. 

Pre-enrolment  identification  of  potential  leavers  appears  feasible 
from  results  of  this  study.  It  was  found  that  leavers  have  a  less 
positive  attitude  than  stayers  toward  membership  in  the  CF  after  recruit 
training,  give  themselves  less  chance  of  successfully  completing  recruit 
training,  and  are  more  tentative  in  thier  commitment  to  the  CF.  These 
results  support  Wiskcff's  (1977)  contention  that  "expectations  about  the 
future  exercise  more  influence  as  a  determiner  of  career  intent  than 
present  conditions"  (p.  27).  Purter  and  Steers  (1973)  also  saw  expecta¬ 
tions  of  the  future  as  having  a  pervasive  influence  on  career  intent  and 
career  decisions  of  recruits.  Identifying  potential  recruits  who  are 
negative  or  doubtful  about  what  to  expect  in  the  military  would 
facilitate  either  screening-out  procedures  ir,  provision  of  counselling 
at  the  selection  stage  to  modify  attitudes  aid  expectations. 

The  fact  that  both  stayers  and  leavers  were  similar  in  their 
assessment  of  pre-enrolment  information  was  somewhat  surprising. 

Previous  researen  indicated  that  leavers'  assessment  of  such  information 
would  be  less  positive  than  stayers.  T.ie  fact  that  almost  one-third  of 
both  groups  were  dissatisfied  with  p, e-enrolment  information,  even  before 
connencement  of  recruit  training,  suggests  several  possibilities.  It 
could  be  that  information  being  given  recruits  about  the  recruit  training 
experience  is  in  fact  not  accurate  and  sufficient  or,  for  some  reason  the 
recruits  are  not  internalizing  the  information.  Possibly,  a  combination 
of  the  above  may  occur,  partly  due  to  the  phenomenon  labelled  "selective 
hearing".  The  significant  negative  shift  in  assessment  of  pre-training 
information  after  three  wee ks  of  training  warrants  se-ious  consideration. 
If  only  the  leavers'  as-cosment  had  shifted  negatively,  one  mioht 
attribute  it  to  rationalization  and  displacement  of  responsibility  for 
the  inability  to  adjust  to  the  rigors  of  recruit  training.  These 
results  do  sugqest  that  when  disconfirmation  does  occur,  it  Is  attributed 
to  pre-training  information.  An  examination  of  pre-training  information 
may  he  in  order.  These  findings  also  further  support  the  contention 
that  stayers  are  better  equipped  to  adjust  to  disconfirmation. 

Reiiharth  and  Wahba  (1975)  postulated  that  the  expectancy  model 
could  emerge  as  ar  influential  approach  to  explaining  work  behaviour  if 
it  could  bo  proven  to  effectively  'describe  and  predict  work  motivation, 
job  effort  and  job  performance"  (p.  521).  Utilization  of  the  expectancy 
model  to  effectively  discriminate  between  stayers  and  leavers  in  CF 


recruit  training  lends  support  to  their  postulation.  The  implications 
of  using  the  expectancy  model  to  identify  potential  leavers  before 
or  shortly  after  enrolment  could  be  far-reaching.  Measurable  savings 
in  human  and  financial  resources  could  accrue  from  the  ability  to 
differentiate  between  these  two  groups  during  the  enrolment  process. 

In  these  days  of  a  shrinking  recruit  pool,  increased  conpetitlon  from 
industry  for  recruits,  and  continual  financial  constraints,  the  ability 
to  reduce  attrition  is  crucial.  Analysis  of  the  remaining  data  collected 
in  this  study  should  provide  further  insight  into  the  role  of 
expectations  in  attrition  during  recruit  training. 
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The  Honor  Code:  The  Optimum  Criterion  of  Honor? 

V. 

The  Honor  System  at  West  Point  is  renowned  as  the  most  important 
single  influence  on  the  lives  of  graduates.  A  cadet  must  be  honest 
and,  harder  yet,  not  tolerate  his  friend's  dishonesty.  Despite  the 
difficulties  of  rectitude  plus  being  his  brother's  keeper,  only  a  few 
cadets  violate  the  system  in  most  years.  A  major  problem  is  failure  of 
the  non-toleration  policy  associated  with  outbreaks  of  group  cheating  - 
six  outbreaks  in  thirty  years.  Four  programs  to  improve  implementation 
of  the  non-toleration  policy  are  offered.  If  future  evidence  shows 
continued  failure  of  the  non-toleration  policy,  an  alternative  crite¬ 
rion  is  offered  for  consideration.  , 
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The  Honor  Code:  The  Colimura  Criterion  of  Honor? 


Clark  L.  Hosmer 
39  Longwood  Dr. 
Shalimar,  PI  3P379 


Ky  purpose  is  to  assess  the  Honor  Code  as  a  criterion  of  the  Honor 
System  at  the  United  States  Llilitary  Academy  at  West  Point,  New 
York.  The  Honor  Code  is,  "A  cadet  will  not  lie,  cheat,  or  steal 
nor  tolerate  those  who  do." 

The  Honor  Code  serves  as,  "...  the  minimum  standard  of  behavior 
required  by  cadets.  Because  of  the  Code,  the  Corps  enjoys  an 
atmosphere  of  trust  and  the  reputation  of  being  honorable.  The 
Code  is  also  a  foundation  for  continual  development  of  a  cadet's 
personal  standards  and  values."  (U.3.H.A. ,  1979). 

The  Honor  System  is  not  100%  successful  in  converting  all  cadets 
to  honorable  behavior.  Usually  fewer  than  one  percent  of  the  - 
cadets  are  found  guilty  of  violating  the  Honor  Code  in  a  year. 

Cadets  come  from  all  walks  of  life  in  the  nation  where  six  out 
of  ten  teen-agers  admit  cheating  on  examinations  (Gallup,  1979). 

Some  cadets  are  unable  to  make  the  transition  to  exemplary  honorable 
behavior.  Not  only  must  the  cadet  be  honest  but  the  Honor  Code 
demands  that  the  cadet  not  tolerate  another  cadet's  violation  of 
honor.  The  cadet  is  expected  to  be  as  offended  by  a  friend's  viola¬ 
tion  of  honor  as  by  his  or  her  own  violation.  Woe  to  both  the  vio¬ 
lator  and  the  to.lerat.or,,  Both  are  subject  to  dismissal  from  the 
academy.  "He  that  worketh  deceit  shall  not  dwell  within  my  house... 
nor  tarry  in  my  sight."  (Psalms  101:7). 

Except  for  the  few  violators,  the  cadets  who  survive  the  highly 
competitive  regime  of  academic  studies  and  military  training  de- 
v’O'iod  a  camaraderie  of  trust  among  themselves.  They  become  imbued 
wit.--,  the  rigor  of  the  Honor  System  that  demands  rectitude  despite 
discomfort  or  cost  to  self  or  others,  fhe  Honor  Code  becomes  a 
shared  no  ml  imperative  and  criterion  for  service  described  by  .Vest 
Point's  notto,  "Duty  Honor  Country."  Graduates  often  describe  the 
Honor  Code  „c  the  single  most  powerful ,  enduring,  and  beneficial  in¬ 
fluence  that  the  academy  has  had  on  their  lives. 

Vulnerability  to  drone  Cheating 

n  out  ore  a-.:  of  .you;)  cheating  n  as  occurred  at  Vest  Point  six  times 
in  the  Intent  thirty  yern.  In  1376,  the  group  involved  more  than 
cadets.  ?:.c-  episode  was  of  cue::  .uigr.itude  aid  complexity  t.rit 
t;e  Decjv.  t.nry  a:  the  ...my  .appointed  a  special  commission  to  study, 
"...  uncertain.-  causes  in  tac  context  of  the  Honor  Code  and  Honor 
' . *  :--:eir  p :  nee  in  t..e  hi  lit  nr;.  ..enrier-.y."  (rfomai  ,  p.  9). 
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Official  reports  cited  toleration  of  honor  violations  as  a  major  prob¬ 
lem  for  four  years  before  the  1976  outbreak  of  group  cheating.  Then 
in  1974  a  survey  of  the  cadets  showed  that  7355  would  not  report  a  good 
friend  for  a  possible  honor  violation  and  34$$  would  not  report  a  good 
friend  for  a  clear-cut  violation.  Forty-five  percent  of  the  cadets 
wanted  toleration  removed  as  a  personal  honor  violation  (3orman,  p.  14) . 
Although  toleration  was  an  obvious  problem,  for  a  period  of  ten  years 
before  1976,  conviction  of  cadets  for  toleration  alone  was  only  one-half 
of  one  percent  of  all  convictions  of  honor  violations  (Borman,  p.  17). 
That  finding  suggests  that  toleration  of  toleration  was  almost  the  rule. 

Toleration  of  another's  honor  violation  creates  vulnerability  of  the 
Corps  of  Cadets  to  group  cheating.  3efore  the  1976  episode,  the  Superin¬ 
tendent's  Honor  Review  Committee  reported,  "...  any  cheating  scandal 
would  find  its  beginning  in  a  'toleration'  situation,  i.e.,  a  cadet 
would  observe  a  friend  or  roommate  cheating  but  because  of  their  close¬ 
ness  would  not  report  the  incident.  From  that  point  a  vicious  chain 
would  gradually  find  its  way  to  other  cadets."  (Borman,  p.  7). 

The  evidence  cited  in  the  Borman  report  suggests  that  the  Honor  Code 
may  not  be  the  optimum  criterion  for  the  Honor  System.  The  non-tolera¬ 
tion  clause  was  ineffective  which  opened  tne  door  to  possible  group 
cheating. 


To  Improve  the  Sffectiveness  of  the  Kon-Toleration  Clause 


The  four  following  programs  may  further  the  progress  already  made  by 
the  Corps  of  Cadets  and  the  academy's  superintendent  and  staff  and 
faculty  toward  improving  the  Honor  System.  The  objective  of  the  pro¬ 
grams  is  to  reduce  the  rate  of  cadet  readiness  to  tolerate  another's 
honor  violation  and  thereby  reduce  vulnerability  of  the  system  to 
group  cheating. 


1.  A  nrogram  to  collect  anonymous  responses  of  a  random  sample 
of  cadets  to  well  designed  items  that  are  repeated  every  quarter  plus 
special  items  on  problems  that  come  up  would  provide  valuable  objective 
data.  Significant  trends  in  strengthening  of,  or  trouble  in,  attitudes 
toward  r.or.-tcieration  would  be  on  the  table  for  hrandling.  Another 
source  of  data  could  be  coverage  of  tne  r.on-toleration  policy  in  the 
exit  interview  of  every  cadet  who  leaves  the  academy  whether  for  aca¬ 
demic  deficiency  or  honor  violation  or  other  reason.  Due  to  the  free¬ 
dom  the  departing  cadet  has  to  ventilate  privately  held  views,  the  exit 
interview  would  collect  information  not  anticipated  by  the  regular  items 
used  in  the  quarterly  surveys.  Strong  views  associated  ’with  reasons  for 
departure  could  be  assessed  for  their  influence  on  answers  to  the  regular 
items.  A  third  source  of  information  could  be  invited  papers  by  cadets 
in  professional  ethics  and  leadership  classes.  If  each  cadet  were  invited 
each  year  to  submit,  r  paper  on  any  aspect  of  tne  Honor  System  that  he  or 
she  selects,  the  range  of  system  strengths  and  problems,  let  alone  sug¬ 
gestions,  woul i  orovide  trends  in  elements  faring  well  and  those  that 
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warrant  special  attention  of  tr.e  Cadet  honor  Committees  and  pernaps 
the  academy  administration  as  '.veil,  four  t  sou  sand  papers  would  be 
mountainous,  of  course,  but  sorting  of  topics  and  harvesting  of 
suggestions  on  pivotal  subjects  like  non-toleration  would,  in  my  view, 
justify  J'hc  efforts  by  cadets  and  analysts, 

2.  A  program  of  demonstration  by  staged  skits  would  help  to 
exercise  the  new  cadets  in  handling  of  the  problems  of  identifying 

a  violation,  counseling  with  the  violator  to  make  sure  of  the  facts, 
and  seeing  that  one  or  both  report  the  event  to  an  honor  Representative, 
Skits  based  on  historical  cases  without  identification  of  persons,  would 
heighten  realism.  Role-playing  the  world  of  cadet  life,  to  include 
upperclassman  roles,  would  exercise  upperclassmen,  too,  in  handling 
the  classic  tough  case  of  seeing  a  close  friend,  or  for  many  the  even 
tougher  case  of  seeing  a  supervisor,  violate  tic  honor  code.  The  skill 
required  to  react  promptly  with  tact  and  yet  with  effective  force  does 
not  come  from  exhortation  to  do  so.  flits  have  unusual  powers  in  terms 
ofil lustratir.g  a  range  of  problems  and  developing  confidence  to  confront 
rare  events, 

3.  A  program  of  new  and  old  cadets  ir,/liai5'rg^oups,  with  a  minimum 
of  direction  by  a  supervisory  level  chairman ,  could  provide  self-convincin 
opportunities  for  understanding  the  imperativeness  of  the  non-toleration 
policy.  The  most  effective  way  for  people  to  understand  and  accept  new 
ideas  is  in  free-for-al L  discussion  of  ail  tae  pros  and  cons.  Most  of 
the  groups  of  people  who  were  allowed  to  tali;  freely  among  themselves 
about  the  pros  and  cons  of  a  system  chose  loyalty  to  group  interests. 
■Vithout  complete  freedom  of  discussion  in  confronting  a  new  system, 

no never,  7b. j  of  smell  groups  chose  to  strive  for  individual  interests. 

The  individual  competition  reduced  gains  for  individuals  as  well  as  for 
the  groups  as  a  whole  (iidney,  p.  84) . 


.  A  program  to  publicise  every  sunrter  selected  findings  from 
the  information  ga rr.ered  by  the  activities  above  could  keep  the  Corps 
of  Cadets  focused  or.  implementation  of  the  r.or.-tol ere. tion  policy.  The 
suggestion  of  publicity  is  controversial.  If  a  survey  shows  a  third 
of  the  cadets  are  skeptical  about  non-toleration,  would  reporting  that 
finding  to  all  the  cadets  suggest  to  some  of  then  taut  they  may  as  well 
join  the  tolerators?  ..ho. t ever  the  level  of  skepticism,  to  share  that 
information  within  a  program  c-i.v.ud  vt  lowering  vulnerability  to  group 
cheating, in  my  view  would  pay  in  the  long  run.  Could  leakage  to  the 
outlie  media  about  a  third  of  tae  cadets  being  skeptical  hurt  the  Corps? 
The  national  furor  about  hue  1376  cheating  scandal  was  a  blow  but  media 
reactions  included  staunch  defenses  of  the  splendid  reputation  for  honor 
in  the  Corps  that  soon  again  the  solid  reputation  would  be  warranted. 

The-  problem  of  vulnerability  to  group  cheating  is  too  important,  in  my 
view,  to  use  usual  staffing  of  solutions  and  announcement  of  what  is  to 
uS  done ,  T..0  honor  System  is  the  cadets'  ns  backed  up  by  the  superin¬ 

tendent  and  his  staff  -.ad  faculty.  Unless  the  cadets  accept  the  system 
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will  continue.  The  best  bet,  in  my  view,  for  beepin’  vulnerabili 
at  a  r.-.ininur-.  includes  the  cadets  iaenuelvea  knowing  the  objective 
GCoi'C .  i’or  the  most  part  they  will  solve  tie  problems  themselves 
ibu’.ortution  will  ::ot  solve  the  problem.  Concerted  internal  Corps 
;iro(';ra.-,s  would  seem  to  be  the  most  effective  approach. 
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Is  an  Alternative  Criterion  Available? 

There  is  nr.  old  savin’,  "I'"  it  ain’t  broke,  don’t  try  to  fix  it." 

If,  however,  future  evidence  snows,  that  the  non-toleration  policy  is 
fully  accepted  by  only,  say,  75,j  of  the  cadets,  would  consideration 
of  an  alternative  criterion  for  the  Honor  Code  be  warranted?  In  case 
objective  evidence  of  cadet  determination  to  implement  the  non-tolera¬ 
tion  Policy  is  less  than  the  level  that  the  Corps  sets  to  prevent 
group  cheating,  I  submit  a  possible  alternative  Honor  Code  for  con¬ 
sideration. 

On  its  establishment  in  1802,  ..est  Point  adopted  a  code,  the  core  of 
which  was,  "A  cadet  is  fundamentally  honest  and  therefore  accepted 
at  his  word."  (U.C.h.A.,  VJ7 f ,  p.  6),  Those  words  show  the  compel¬ 
ling  reason  foi',  and  the  result  of,  being  honest.  Also,  the  state¬ 
ment  is  positive.  Ir.  my  view  the  affirmative  has  power  that  prohibi¬ 
tion  of  lying,  cheating,  and  stealing  does  not  match.  "Fundamentally 
nonest,"  is  broad-brush  as  contrasted  with  the  specificity  of  the 
prohibitions  in  the  present  Code.  Honor  itself  lacks  specificity. 

The  original  Code  may  be  more  enlightening  for  its  scope  than  the 
short  inventory  of  three  evils  featured  today  among  the  many  such  as 
bribery,  fraud,  forgery,  plagiarism.,  reneging  on  a  commitment,  writing 
a  bad  check,  etc. 

If  the  original  Code  were  re-adopted,  or  some  variation  of  it,  the 
non-toleration  clause  would  no  longer  be  a  part  of  the  Honor  Code. 

The  nor.-toierntion  col  i  ey  could  continue,  however,  and  perhaps  more 
effectively,  he  longer  v.oula  the  stretciaing  of  the  meaning  of  one’s 
personal  honor  to  include  what  one  does  about  somebody  else’s  viola¬ 
tion  be  an  integral  nart  of  the  creed.  Remember  that  4b;7  of  the  cadets 
indicated  ir.  an  official  survey  that  they  wanted  toleration  removed  as 
a  personal  violation.  Those  cadets  may  have  been  objecting  to  the 
specific  point  of  stretched  oer  Jor.al  honor. 

In  lieu  of  tne  non-toleration  clause,  there  is  an  alternative  exores- 
sior.  for  the  offense  of  tolerating  another’s  honor  violation.  The 
word  is,  "misprision. "  To  conceal  a  crime  that  one  is  not  himself 
guilty  of,  to  fall  to  report  an  offense  committed  by  somebody  else, 
to  aid  ur.d  '  bet  r.  criminal  in  avoiding  justice  -  that  is  misprision. 

cadet  who  tolerates  another's  honor  violation  could  be  cnarged  with 
nisnrisior.  an,  if  convicted,  levied  heavy  penalties. 

the  venal  ty  for  toleration,  wren  the  .non-toleration  clause  was 
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formally  added  to  the  honor  Code  in  1970  in  confin.iat  io;i  of  informal 
practice  of  the  non-toleration  policy  soon  after  trie  turn  of  the  century, 
that  fixed  dismissal  as  the  usual  sentence.  But  the  evidence  suggests 
that  toleration  flourished  despite  the  severity  of  trie  penalty.  The 
result  is  as  though  a  majority  of  the  cadets  challenged  the  justice 
of  dismissal  for  every  level  cf  toleration.  Dismissal  may  be  appropriate 
for  a  tolerator  who  is  an  accessory  before  trie  fact  of  another's  honor 
violation  by  cheating,  and  aids  and  abets  the  cheater's  avoidance  of 
justice  after  the  fact.  The  tolerator  who  counsels  a  thief  to  return 
the  stolen  item,  however,  or  who  ignores  his  friend's  unauthorized 
possession  of  a  government  pen,  might  be  levied  punishment  short  of 
banishment  from  the  academy, in  my  view.  Punishment  to  fit  the  crime 
of  toleration  at  its  various  levels  of  seriousness  might  well  increase 
cadet  reporting  of  tolerators  in  a  reversal  of  past  indulgence  of  them. 


Conclusion 

The  evidence  suggests  that  tne  non-toleration  clause  of  the  Honor  Code 
had  not  been  effective  in  convincing  more  than  a  third  of  the  cadets 
in  1974  to  accept  the  non-toleration  policy.  Implementation  of  the 
non-toleration  policy  was  ineffective  for  a  period  of  ten  years  before 
1976.  Reliable  and  objective  estimates  of  cadet  skepticism  about  the 
non-toleration  policy  can  be  made  available.  Programs  to  increase 
cadets  full  acceptance  and  implementation  of  the  non-toleration  policy 
are  proposed  in  order  to  reduce  vulnerability  of  the  honor  System 
to  group  cheating.  If  all  actions  fail  to  reduce  vulnerability  of 
the  system  to  group  cheating  under  the  present  Honor  Code,  an  alterna¬ 
tive  criterion  code  would  be  useful  to  consider.  The  original  code 
of  the  Military  ..csdemy  is  proposed  for  co-.si.ieration  if  no  other 
action  produces  relative  invulnerability  to  groups  of  cadets  violating 
tne  Honor  System. 
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Previous  experience  with  innovative  computerized  systems 
suggests  that  the  introduction  of  automated  counseling  procedures 
into  the  military  environment  will  be  attended  with  resistance 
and  an  initial  low  level  of  acceptance  by  users.  However,  user 
acceptance  of  these  new  systems  can  be  improved  if  concepts  from 
the  technology  of  planned  change  and  software  psychology  are 
integrated  into  the  implementation  process. 

A  consensual  model  of  the  change  process  is  presented.  This 
model  structures  organizational  readiness,  the  change  strategy, 
and  acceptance  as  integrated  variables  to  be  considered  in  a 
successful  installation. 

A  basic  introduction  to  the  principles  of  software  psychology 
is  presented.  Past  research  in  the  installation  of  psychology 
related  computer  systems  is  considered  in  terms  of  these  principles. 

A  structured  approach  for  the  installation  of  automated 
military  counseling  procedures  is  proposed.  This  approach  takes 
into  account  issues  of  planned  change  and  software  psychology  as 
well  as  previous  practical  experience  in  the  area.  A  preliminary 
organizational  assessment  instrument  for  use  in  the  installation 
of  automated  counseling  procedures  is  described. 
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In  order  to  successfully  install  a  computerized  counseling 
system,  it  is  necessary  to  consider  a  range  of  factors  that  can 
effect  the  project's  outcome.  The  assessment  of  organizational 
readiness,  structured  implementation  procedures,  strategies  for 
handling  resistance,  and  system  design  features  are  all  crucial 
variables  in  this  process.  The  purpose  of  this  paper  is  to 
beiefly  review  these  areas  to  aquaint  interested  psychologists 
with  a  number  of  new  developments  in  the  field. 


Organizational  Readiness 

The  new  technology  of  planned  change  emerged  from  a  rational 
analysis  of  the  forces  that  encourage  and  impede  change.  When  ar 
innovation  is  planned,  the  forces  within  the  "target"  organization 
must  be  understood  and  then  developed  to  channel  them  for  a  success¬ 
ful  implementation  process.  Davis  (1971)  used  the  acronym  AVI CTO RY 
to  describe  the  dimensions  of  organizational  readiness  that  a 
"change  agent"  must  consider  in  designing  a  plan  for  action.  The 
target  group  must  have  the  ability,  that  is  the  skills  and  re¬ 
sources,  necessary  to  change.  The  innovation  must  be  consistant 
with  the  values  and  operating  style  of  the  target.  Information 
about  the  innovation  must  be  provided.  Other  circumstances  of 
the  target  should  be  considered  so  that  the  timing  of  the  plan 
is  optimal.  Important  personnel  must  feel  the  need  or  obligation 
to  change.  Sources  of  resistance  to  change  must  be  anticipated. 

The  ^ield,  benefits  and  cost-effectiveness,  should  be  clarified. 
Byrnes  and  Johnson  (1981)  constructed  a  questionaire  based  on 
Davis  AVICTORY  model  to  assess  an  organization’s  readiness  for  an 
automated  system.  Table  1  summarizes  the  dimensions  of  this 
questionnaire. 
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Table  1.  A  VICTORY  Dimensions  for  Organizational  Readiness 

for  Automation 


Category 


Ability 


Va lues 


1  Emergence  of  a  system's  advocate,  that  is,  someone 
who  identifies  the  need  for  an  automated  system  and 
will  work  to  push  it  through. 

2  Power  base  of  the  organization  is  supportive  of 
implementation. 

3  Availability  of  resources  for  automation. 

4  Administrative  support  and  cooperation. 

5  Attitudes  and  beliefs  of  users  toward  automation. 


6  Organizational  and  administrative  style. 

7  Organization's  flexibility  and  openness  to  change. 

Information  8  Availability  of  appropriate  computer  programs  or 

computer  science  expertise. 


9 

Presence  of  comparable  manual  system. 

Circumstances 

10 

A  climate  of  trust  is  needed  so  that  communication 
can  be  facilitated. 

Timing 

il 

High  priority  is  being  given  to  the  automation 
project . 

Obligation 

12 

Sources  of  motivation  are  present,  internal  and/or 
external,  for  automation. 

Resistance 

13 

Fears  of  confidentiality,  performance  appraisal,  and 
mechanization  Expected  negative  consequences  of 

automation. 

Yield 

14 

General  staff  and  administration  both  foresee  the 
benefits  of  automation. 
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Change  Strategies 

Strategies  to  effect  planned  change  should  be  chosen  and 
combined  according  to  the  assessed  characteristics  of  the  target 
organization.  Power  strategies,  that  is  the  imposition  of 
sanctions  and  rewards  to  change  b  lavior,  are  recommended  wehn 
staff  commitment  to  the  project  low  and/or  the  time  frame  is 
limited.  Persuasive  strategies,  convincing  the  target  group 
that  the  change  serves  its  purposes,  are  suggested  when  the 
need  or  problem  is  not  recognized  or  the  innovation  is  not  con¬ 
sidered  acceptable.  Reeducational  strategies,  aimed  at  changing 
attitudes  and  values,  are  useful  when  commitment  to  the  change 
is  minimal  and  a  long  time  frame  is  possible.  Facilitative 
strategies  which  provide  assistance  for  easier  implementation , 
are  recommended  as  an  augmentation  to  other  strategies  (Zalttnan 
§  Duncan,  1977).  Table  2  suggests  types  of  issues  to  be  con¬ 
sidered  in  choosing  strategies. 

Previous  reports  about  attempts  to  install  automated  pro¬ 
cedures  for  psychological  purposes  suggest  that  failure  to  con¬ 
sider  such  factors  has  often  led  to  serious  installation 
problems  and  even  rejection  of  innovative  systems  (Johnson, 
Williams,  Giannetti,  Klingler,  §  Nakashima,  1978;  Hedlund, 
Sletten,  Evenson,  Altman  5  Cho,  1977;  St.  Clair,  Siegel,  Caruso 
§  Spivack,  1976;  Leader  §  Klein,  1977;  Landau  5  Wilkes,  1975). 


Software  Psychology 

To  facilitate  acceptance  of  an  automated  system  the  users' 
comfort  and  ease  in  operating  the  computer  are  important.  Certain 
design  features  can  be  varied  to  develop  a  system  appropriate  to 
the  users'  level  of  experience  with  computers.  More  "powerful" 
commands  and  more  "flexible"  commands  (that  is  commands  that  per¬ 
mit  the  user  to  perform  an  operation  with  fewer  instructions  or 
with  a  variety  of  types  of  commands,  respectively)  are  less  de¬ 
manding  of  users  (Ramsey  5  Atwood,  1979).  Such  command  patterns 
should  be  balanced  by  a  low  level  of  "complexity"  so  that  only 
a  few  commands  need  to  be  used  at  one  time  (Stewart,  1976). 
Computer- initiated  dialogue  reduces  the  amount  of  learning 
necessary  to  operate  the  computer,  thereby  reducing  errors. 

Examples  of  computer- initiated  dialogue  are  "yes  or  no"  questions, 
"form- filling"  (short  answer),  and  "menu  selection"  (multiple 
choice)  (Thompson,  1971;  Martin,  1973).  Although  appropriate 
for  inexperienced  users,  such  procedures  restrict  the  communication 
of  information.  To  accomodate  increasingly  experienced  users  a 
"dual  mode  dialogue"  which  allows  switching  to  a  user- initiated 
approach  is  recommended  (Pew  6  Rollins,  1975). 

Programs  can  be  designed  to  give  messages  to  the  user  to 
guide  interactions.  Abmbiguous  input  can  be  resolved  by  presenting 
the  user  with  alternative  interpretations  from  which  he/she  can 
choose  further  commands  (Plath,  1972;  Codd,  1974).  Patterns  of 
errors  can  be  detected  and  "instruction"  for  correcting  such 
errors  can  be  presented  (Rouse,  1977).  The  pace  of  all  such 
guidance  should  be  programmed  to  be  under  the  user's  control 
(Goodwin,  1974). 
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Table  2.  Sample  Questions  for  Surveying  Change  Strateties  Used 
During  Implementation  Processes 


Power /Coercion 

-Were  top  administrators  used  effectively  before  and  during  the 
implementation  process? 

-Was  the  administrative  staff  well  educated  about  computer  approaches? 
-Would  the  chief  administrator  have  penalized  an  employee  who  failed 
to  cooperate  with  the  needs  of  the  automated  system? 

-Was  the  chief  administrator  propared  for  resistance  to  change  prior 
to  the  implementation  process? 

Persuasion 

-Were  staff  involved  in  all  stages  of  the  design  and  implementation 
process? 

-Were  there  full-time  liaison  personnel  to  work  with  the  staff? 

-Were  computer  science  trained  personnel  carefully  exposed  to 

esychologist's  concepts  prior  to  beginning  work  on  the  project? 
ere  staff  opinion  leaders  identified  and  involved  in  the 
installation  of  the  computer  system? 

Reeducation 

-Were  numerous  meetings  and  workshops  held  between  the  prospective 
users  and  project  staff? 

-Were  key  staff  members  given  training  in  data  processing? 

-Were  members  sent  to  other  facilities  with  automated  systems  for 
purposes  of  education? 

-Was  a  computer  project  staff  member  assigned  responsibility  for 
user  education? 


Facilitation 

-Was  the  project  given  adequate  financial  support? 

-Did  the  design  of  the  system  attempt  to  minimize  organizational 
change? 

-Were  chief  administrators  supporting  the  project? 

-Were  additional  staff  provided  to  allow  sufficient  total  involve¬ 
ment  during  implementation? 
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When  recruits  or  enlisted  personnel  are  to  use  the  computer, 
special  care  must  be  taken  to  ensure  that  collected  data  are 
accurate.  Personnel  unable  to  use  a  computer  should  be  screened 
out.  Johnson  et  al  (1977)  have  constructed  such  a  screening 
device.  A  period  of  orientation  and  practice  should  be  allowed 
before  data  collection  begins.  Standardized  responses  and 
specially  marked  or  color  coded  keys  are  suggested  to  make 
computer  interaction  easier.  Design  should  allow  for  storage 
of  partial  data  if  a  user  is  unable  to  complete  a  program  in 
a  single  setting  (Cole,  Johnson  §  Williams,  1976). 

A  design  that  allows  staff  easy  access  to  stored  information 
expands  the  usefulness  of  the  system  for  purposes  of  monitoring, 
documentation,  program  evaluation,  and  planning.  System  monitoring 
programs  provide  records  of  errors  for  ongoing  evaluation  and 
refinement  of  the  system. 

Approaches  found  to  facilitate  interpretation  of  output  data 
are:  graphics  (Schniederman,  1980),  data  summary  displays  (Ramsey 

§  Atwood,  1979),  coding  by  color,  brightness,  size  and  grouping 
(Stewart,  1976;  Teichner,  Christ  §  Corso,  1977),  and  flashing 
important  data  (Smith  §  Goodwin,  1971).  Output  of  printed  reports 
can  be  varied  in  size,  type,  and  style  according  to  the  users' 
needs.  Alternative  sentence  stems  can  be  varied  so  that  each 
report  is  structurally  unique.  Retrofit  programming  techniques 
(fitting  data  from  disparate  collection  procedures  backwards 
around  key  data)  produces  unified  reports  that  readers  seem  to 
like  better  (Johnson  et  al. ,  1977). 

Important  physical  design  features  include:  alternative 
keyboards  to  accomodate  typers  and  non- typers  (Schniederman,  1980; 
Hirsch,  1976),  computer  response  time  appropriate  to  the  user's 
perception  of  task  difficulty  (Schniederman,  1980),  yellow  dis¬ 
plays  to  reduce  eye  strain  (Ramsey  5  Atwood,  1979),  and  elimination 
of  flicker  which  causes  fatigue  and  dissatisfaction  (Dill  § 

Gculd,  1970).  Detailed  design  guidelines  applicable  to  psycho¬ 
logical  settings  have  been  developed  by  Hanson  (1971),  Wasserman 
(1973),  Pew  §  Rollins  (1975),  Gaines  6  Racey  (1975),  and  Schniederma: 
(1980). 

All  factors  facilitating  user  ease  of  operation  and  comfort 
with  the  computer  should  be  considered  to  enhance  efficiency  and 
encourage  acceptance  of  the  system.  These  considerations  are  an 
important  part  of  the  facilitative  implementation  strategy. 


Suggested  Implementation  Procedures 

Keeping  in  mind  the  organizational  readiness,  strategy,  and 
design  features  discussed  above,  the  following  step-by-step 
implementation  procedure  is  suggested. 

Step  1:  Assessment.  Systems  developers  should  review  the 
change  technology  and  software  psychology  literature  in  order  to 
acquire  the  orientation  and  skills  (information  and  ability) 
necessary  to  contribute  to  the  design  of  an  implementation  plan. 

The  organizational  readiness  instrument  (described  above)  is 
suggested  to  determine  potential  problems  and  resistances.  The 
relationship  between  permanent  personnel  and  temporary  implernentatioi 
staff  should  be  clarified  to  alleviate  resistance  due  to  fears 
of  loss  of  power  (Jones,  1969).  Meetings  would  be  scheduled  to 
obtain  design  input  from  staff  (values),  to  deliniate  the  problems 


of  the  current  system  (establish  obligation)  and  the  rationale 
for  an  automated  solution  (demonstration  £ield).  These  procedures 
should  ensure  optimum  participation  by  opinion  leaders  (who  can 
best  evaluate  circums tances  and  plan  ^timing).  Opinion  leaders 
need  to  become  "insiders"  who  are  supportive  of  the  proposed 
change  and  who  can  use  their  influence  over  other  personnel  to 
overcome  resistances  (Zaltman  5  Duncan,  1977). 

Step  2:  Plan  design.  The  goals  of  the  program  must  be 
clarified  to  make  them  consistant  with  the  organization's  values 
and  to  form  the  basis  for  later  evaluations.  Software  and  hard¬ 
ware  should  be  selected  based  on  organizational  needs  and  the 
facilitation  of  user  acceptance.  Duties  of  all  personnel  must 
be  deliniated.  Procedures  for  feeding  information  about  problems 
back  to  superiors  should  be  described.  In  this  manner  a  foundation 
can  be  developed  for  the  rewards  and  sanctions  of  a  power  strategy. 

Expressions  of  resistance  are  useful  information  in  evaluating 
and  refining  design  of  the  system.  Resistance  that  is  not  openly 
expressed,  but  is  only  acted  upon,  is  a  serious  implementation 
problem.  Thus,  it  is  necessary  to  define  procedures  for  con¬ 
structive  expression  of  resistance.  Planning  and  accepting 
resistance  in  a  positive  manner  will  increase  involvement  of 
personnel  in  the  project,  shift  social  support  toward  project 
success,  discourage  passive  resistance,  and  provide  necessary 
information  for  program  evaluation  and  modification  (Kelman 
5  Warwick,  1973;  Johnson  et  al.,  1978). 

Details  about  scheduling  and  training  groups  should  be  de¬ 
termined  in  advance  to  avoid  disruption  of  organizational 
functioning  and  to  form  working  units  based  on  level  of  training 
needed  and  ability  to  provide  support  for  the  program's  success. 

Step  3:  Training  Personnel.  Attendance  at  all  training 
meetings  should  be  required  (power) .  Arrangements  need  to  be 
made  to  adjust  work  loads  during  this  period  so  that  training 
dows  not  result  in  extra  work  (facilitation) .  Opinion  leaders 
should  be  used  in  an  orientation  program  to  explain  the  rationale 
for  the  automated  system  (persuas ive/reeducative  strategy)  and 
to  allow  staff  to  become  familiar  with  the  new  equipment  in  a 
casual  way  (facilitation) .  Training  sessions  should  be  pre¬ 
sented  in  small ,  successive  steps  to  guarantee  learing  paced 
to  the  users'  skill  level.  Extensive  practice  sessions  often 
need  to  be  scheduled  (St.  Clair  et  al.,  1976). 

Step  4:  Ongoing  training  and  evaluation.  Temporary  staff 
should  be  phased  out  and/or  used  on  a  consulting  basis  during 
this  period.  Monthly  meetings  and  workshops  need  to  be 
scheduled  to  solicit  feedback  regarding  resistance,  to  inform 
personnel  of  progress  and  adjustments  in  the  system,  thus 
demonstrating  j^ield  and  responsiveness  to  values  expressed 
(Hedlund  et  al.,  1977).  These  meetings  provide  a  forum  for 
continued  persuasion  and  re-education  of  the  staff  over  long 
periods  of  time,  as  well  as  to  facilitate  the  program  by  con¬ 
tinually  increasing  user  skills. 


Summary 


To  summarize,  there  are  a  number  of  factors  related  to 
the  successful  implementation  of  a  computerized  military 
counseling  system  that  need  to  be  considered.  By  assessing 
organizational  readiness,  and  carefully  structuring  implemen¬ 
tation  procedures,  a  ^lan  can  be  developed  that  will  have  the 
greatest  probability  of  acceptance  and  success.  It  is  re¬ 
commended  that  all  developers  of  computerized  approaches  pay 
careful  attention  to  these  factros  if  they  wish  maximum 
acceptance  of  their  systems. 
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All  applicants  for  military  service  are  required  to  achieve  a  minimum  score 
on  the  Armed  Forces  Qualification  Test  (AFQT)  to  be  eligible  for  enlistment.  The 
AFQT  is  a  composite  score  derived  from  four  of  the  subteata  of  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB) .  Applicants  for  the  Army  also  have  to  achieve 
scores  on  Army  Aptitude  Area  Composites  which  are  various  combinations  of 
the  ASVAB  subtests. 

The  traditional  Enlistment  Screening  Test  (EST)  used  by  recruiters  for  all 
military  services  to  screen  service  applicants  for  potential  failure  has  been 
designed  to  predict  the  AFQT  score  (Bayroff ,  Thomas  &  Kehr,  1959;  Jensen  & 
Valentine,  1576;  Mathews,  1981;  Morton  &  Houston,  1957).  The  U.S,  Army  Research 
Institute  CAR1)  has  developed  a  Pre-enlistment  Recruiting  Test  (PERT)  to  predict 
ASVAB  Army  Aptitude  Area  Composite  scores,  as  veil  as  scores  on  the  AFQT.  The 
PERT  mini-battery  is  a  shorter  but  parallel  version  of  the  operational  ASVAB. 

PERT  scores  will  be  used  within  the  Army's  new  Joint  Optical  Information  Network 
(JOIN)  system,  a  system  using  mini-computers  with  video  display  capabilities  for 
use  in  Army  recruiting  stations.  PERT-derived  Aptitude  Area  scores  will  allow 
a  racruiter  to  discuss  particular  Military  Occupational  Specialties  (MOS)  for 
which  an  applicant  might  be  most  qualified. 

This  paper  presents  preliminary  results  of  the  criterion-related  validation 
of  the  PERT  against  the  operational  ASVAB. 


Method 


Instrument 

The  PERT  wcs  developed  during  the  fall  of  1980.  Table  1  presents  the 
correspondence  between  the  subcests  of  the  PERT  and  the  operational  ASVAB  8/9/] 
The  80  items  comprising  the  eight  non-speeded  PERT  subtests  were  collected  as 
follows.  Forty  items  were  from  ASVAB  Form  2  and  four  items  from  ASVAB  Form  1. 
Twenty  six  items  were  original  and  ten  Paragraph  Comprehension  items  were  obtal 
from  a  cross-section  of  three  experimental  Enlistment  Screening  Test  (EST) 
booklets  (Mathews,  1981).  All  of  the  72  Coding  Speed  (CS)  items  were  original. 
None  of  the  PERT  items  is  in  the  operational  ASVAB. 

There  are  three  differences  between  the  PERT  and  ASVAB.  First,  the  PERT 
has  no  equivalent  subtest  for  Numerical  Operations  (NO).  The  Coding  Speed  (CS) 
subtest  in  PERT  substituted  for  both  the  CS  and  NO  in  the  operational  ASVAB. 

While  the  correlation  between  CS  and  NO  in  the  operational  ASVAB  is  only  .64, 
the  patterns  of  correlations  between  these  two  speeded  subtests  with  the  non- 
speeded  ASVAB  subtesta  are  quite  similar  (see  Table  K-3  in  Sims  and  Truss,  1980' 
Second,  the  PERT  subtests  have  fewer  items  than  the  ASVAB  subtests.  Third,  the 
items  for  the  PERT  subtests  are  not  presented  as  separate  content  areas  nor  are 
they  independently  timed.  The  one  exception  is  the  PERT  subtest  for  Coding 
Speed  which  is  presented  in  a  separate  booklet  (Test  Book  II)  and  has  a  five 
minnte  time  limit.  The  other  PERT  subtest  items  are  sequentially  presented  in 
Tes^  Book  1  such  that  three  items  from  a  subtest  are  followed  by  three  items 
from  the  next  subtest.  This  sequencing  continues  until  nine  items  from  each 
of  the  subteats  are  presented.  The  last  eight  items  of  the  80  items  in  Test 
Book  I  are  the  successive  presentations  of  the  10th  items  in  each  of  the  eight 
PERT  subtests.  The  major  advantage  is  that  the  recruiter  does  not  need  to  monit 
time  limits  nor  give  specific  directions  for  each  subtest.  The  examinee  is 
provided  with  Initial  directions  and  example  problems  and  allowed  to  complete 
Test  Book  I  (all  subtests  except  CS)  in  50  minutes.  In  the  event  a  slower 
examinee  does  not  finish,  each  subcest  should  be  affected  about  equally. 

The  examinee  is  then  provided  with  Test  Book  II,  and  allowed  five  minutes 
to  answer  the  72  CS  items . 

Procedure 

In  March  of  1981,  the  U.3.  Army  Recruiting  Command  (USAREC)  tasked  each 
of  the  57  District  Recruiting  Commands  (DRC's)  to  select  10  of  the  recruiting 
stations  in  their  districts  to  paicicipate  in  the  research  project.  This  would 
yield  a  total  of  570  recruiting  stations.  They  were  selected  in  the  following 
manner.  First,  all  recruiting  stations  which  had  station  commanders  who  were 
not  on  production  were  identified,  (Non-production  station  commanders  are  not 
directly  involved  in  solicitirg  applicants).  Second,  within  each  DRC,  up 
to  10  recruiting  stations  were  randomly  selected.  The  DRC's  and  their  designated 
recruiting  stations  participated  in  the  study  until  each  station  commander 
tested  15  applicants  and  forwarded  to  the  DRC  15  complete  and  usable  answer 
sheets.  Each  DRC  then  forwarded  all  answer  sheets  to  ARI. 


Table  1 


Correspondence  Between  the  PERT  and 
ASVAB  Subtests 


Humber 

Subtest  of  Ite*B 

Content  ASVAB  PERT 


Word  Knowledge  (WK) 

Understanding  the  meaning 
of  words,  i.e.  vocabulary 

35 

10 

Arithmetic  Reasoning 

<AR) 

Word  pr  obi  etna  emphasizing 
mathematical  reasoning 
rather  that  mathematical 
knowledge 

30 

10 

Paragraph 

Coophrehension  (PC) 

Understanding  the  meaning 
of  paragraphs 

15 

10 

Numerical  operations 

(NO) 

A  speeded  test  of  four 
arithmetic  operations, 
i.e.,  addition,  subtrac¬ 
tion,  multiplication  and 
division 

50 

None 

Mechanical 
Comprehension  (MC) 

Knowledge  of  general  mechnical 
and  physical  principles 

25 

10 

Electronics 
Information  (El) 

Knowledge  of  electronics  and 
radio  principles 

20 

10 

Auto-Shop  Information 

(AS) 

Knowledge  of  auto  mechanics, 
shop  practices  and  tool  functions. 

25 

10 

Mathematics  Knowledge 

(MK) 

Knowledge  of  algebra, 
geometry  and  fractions 

25 

10 

General  Science  (GS) 

Knowledge  of  the  physical 
and  biological  sciences 

25 

10 

Coding  Speed  (CS) 

A  speeded  test  of  matching 

84 

72 

words  and  numbers 
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During  Che  last  week  In  April,  each  of  the  designated  station  commanders 
was  sent  a  packet  of  materials  which  included:  a  letter  explaining  the  purpose 
of  the  research  project;  PERT  Test  booklets  and  IBM  scorable  answer  sheets; 
test  administration  instructions.  The  station  comanders  were  instructed  to 
administer  the  PERT  to  all  applicants.  An  individual  was  defined  as  an  applicant 
when  a  recruiter  completed  a  USAREC  Form  714  for  the  individual.  (A  Form  714  is 
completed  on  applicants  who  are  intending  to  be  tested  by  the  operational  ASVAB) . 
After  that  time,  and  before  the  applicant  received  the  operational  ASVAB,  the 
station  coa&ander  administered  the  PERT  and  had  the  applicant  record  all  responses 
on  a  single  answer  sheet.  The  testing  room  normally  used  by  the  recruiters  at 
the  recruiting  station  was  used  for  this  purpose. 

Subjects 

This  procedure  yielded  2,921  answer  sheets  returned  to  ARI.  Nineteen  of 
these  were  dropped  due  to  missing  social  security  numbers .  The  distribution 
of  the  remaining  2,902  responses  by  DRC  and  Regional  Recruiting  Command  (RRC) 
are  presented  in  Tables  2  and  3,  respectively.  A  large  portion  of  the  PERT 
testing  was  completed  in  May  with  the  remainder  finishing  in  June.  Accordingly, 
for  data  analyses  the  May  applicants  were  treated  as  the  developmental  sample 
and  the  June  applicants  as  Che  validation  sample.  The  criteria,  the  actual 
ASVAB  test  results,  were  obtained  from  the  Military  Processing  Coamand  (MEPCOM) 
for  the  May  and  June  applicants.  When  the  social  security  numbers  were 
matched  against  those  on  the  MEPCOM  tapes,  1,058  and  478  matches  were  found 
for  the  May  and  June  applicants  respectively.  The  applicants  who  had  results 
for  tests  other  than  ASVAB  8/9/10  were  dropped,  yielding  developmental  and 
validation  samples  of  1,047  and  473  respectively.  The  distribution  of  these 
two  samples  with  respect  to  the  DRC  where  PERT  was  administered  is  also  presented 
in  Tables  2  and  3. 

An  inspection  of  Table  2  indicates  that  the  applicants  tested  were 
nationally  distributed  with  no  apparent  regional  biases.  Table  3  suimaarlzes 
these  data  with  respect  to  the  five  recruiting  regions.  Comparing  the  percent 
of  matched  returns  between  the  developmental  and  validation  sample  in  each 
region  highlights  a  problem  with  using  month  of  testing  as  the  basis  for  dividing 
the  total  sample  into  a  developmental  and  validation  sample.  Since  the  majority 
of  the  applicants  In  the  Northeast  and  Southeast  Regions  were  tested  early 
(i.e.,  in  May)  these  two  regions  are  over-represented  in  the  developmental 
sample  and  under-represented  In  the  validation  sample.  Just  the  opposite 
occurred  for  the  Southwest,  Midwest  and  Western  Regions.  While  this  unequal 
distribution  is  unfortunate,  it  will  serve  to  make  the  validation  of  the 
regression  weights  computed  in  the  developmental  sample  a  more  stringent  test, 
since  the  validation  sample  may  be  less  similar  to  the  developmental  sample. 

This  will  tend  to  decrease  the  size  of  the  cross-validated  Regression 
Coefficients.  To  ascertain  the  extent  to  which  the  developmental  and  validation 
samples  differed,  demographic  characteristics  and  AFQT  scores  were  examined 
for  each  sample. 
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Table  2 


The  Number  of  Applicants  Tested  on  the 
PERT  in  Each  District  Recruiting  Coamand  (DRC) 


Number 

of  Applicant 

8 

Number  of  Applicants 

Matched 

Matched 

District 

with  AS  VAR 

District 

with  ASVAB 

Recruiting  Tested 

Develop- 

Recruiting 

Tested 

Develop- 

Command  with  PERT 

mental  Validation 

Command 

with  PERT 

mental  Validatloi 

Northeast  Region 

Southeast  Region 

Albany 

28 

13 

™ 

Atlanta 

34 

14 

— 

Baltimore 

33 

15 

4 

Beckley 

— 

— 

— 

Boston 

33 

9 

1 

Charlotte 

75 

24 

3 

loncord 

25 

16 

— 

Columbia 

76 

17 

3 

larrisburg 

49 

30 

1 

Jacksonville 

53 

29 

7 

'Jtw  Haven 

28 

7 

— 

Louisville 

12 

8 

5 

„ong  Island 

103 

37 

1 

Miami 

40 

6 

fevburg, 

39 

22 

— 

Montgomery 

5o 

18 

5 

ft.  Monmouth,  NJ 

— 

— 

— 

Nashville 

90 

27 

13 

Jiagara  Fal ls 

67 

28 

— 

Raleigh 

108 

37 

2 

’hlladelphla 

81 

28 

2 

Richmond 

15 

6 

7 

’ittsburg 

29 

14 

— 

San  Juan 

85 

9 

19 

Syracuse 

87 

31 

— 

Total  687 

229 

70 

Total 

602 

250 

9 

-outhwest  Region 

lbuquerque 

31 

11 

3 

alias 

67 

29 

6 

envcr 

18 

1 

3 

ouston 

43 

17 

8 

ackson 

34 

13 

5 

anaas  City 

46 

28 

10 

ittle  Rock 

7 

8 

15 

ew  Orleans 

31 

18 

6 

klahoma  City 

64 

18 

17 

an  Antonio 

43 

15 

9 

Total 

384 

158 

82 
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Table  2  (continued) 


Number  of  Applicants 

Number  of  Applica 

Matched 

Matched 

District 

with 

ASVAB 

District 

with  AS' 

Recruiting 

Tested 

Develop¬ 

• 

Recruiting 

Tested 

Develop¬ 

Command  (DRC) 

with  PERT 

mental 

Validation 

Command  (DRC) 

with  PERT 

mental  Va! 

Midwest  Region 

Western  Region 

Chicago 

27 

6 

9 

San  Francisco 

72 

32 

Cincinnati 

57 

23 

15 

Honolulu 

— 

— 

Cleveland 

105 

31 

31 

Los  Angeles 

40 

14 

Columbus 

99 

44 

*22 

Phoenix 

72 

17 

Des  Moines 

16 

9 

7 

Portland 

23 

5 

Detroit 

27 

11 

7 

Sacramento 

59 

10 

Indianapolis 

60 

27 

16 

Salt  Lake  City 

25 

11 

Lansing 

39 

11 

11 

Santa  Ana 

89 

21 

Milwaukee 

72 

30 

17 

Seattle 

45 

19 

Minneapolis 

56 

21 

17 

Total  425 

129 

Omaha 

36 

17 

12 

Peoria 

78 

24 

26 

St.  Louis 

56 

21 

11 

Total  728 

275 

201 

Note:  The  number  of  applicant  test  results  with  unidentifiable 
DRC  was  76  for  the  total  number  of  applicants  tested,  6 
for  the  developmental  sample,  and  4  for  validation  sample. 
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Table  3 


The  Number  of  Applicants  Tested  on  the 
PERT  in  Each  Regional  Recruiting  Command  (RRC) 


Number  of  Applicants 


Regional 
Recruiting 
Command  (PRC) 

Tested 
with  PERT 

Matched  with  ASVAB 
Develop¬ 
mental  Validation 

Northeast 

602 

250  (42%) 

9  (1%) 

Southwest 

384 

158  (41%) 

82  (21%) 

Southeast 

687 

229  (33%) 

70  (10%) 

Midwest 

728 

275  (38%) 

201  (28%) 

Western 

425 

129  (30%) 

107  (25%) 

2,826 

1,041  (37%) 

469  (17%) 
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The  distribution  of  both  samples  with  respect  to  the  demographic  variables 
of  Gender,  Race  and  Education  information  available  from  the  MEPCOM  tapes,  is 
presented  in  Table  4.  The  validation  sample  includes  a  slightly  higher  proportion 
of  whites  than  the  developmental  sample,  i.e.,  712  vs.  612.  The  proportion 
of  males  in  both  samples  is  identical,  782.  The  distribution  of  applicants  by 
education  for  the  developmental  sample  is  412  for  high  school  graduates,  142 
for  high  school  seniors  and  452  for  those  with  General  Educational  Diplomas  (GED) . 
The  corresponding  distribution  fcr  the  validation  sample  is  472,  112  and  422. 

Thus,  the  developmental  and  validation  samples  are  similar  with  respect  to 
demographic  characteristics. 

A  critical  dimension  of  both  samples  is  the  range  of  scores  on  the  AFQT 
portion  of  the  operational  ASVAB.  Table  5  reveals  that  the  developmental  and 
validation  samples  are  similar  in  respect  to  AFQT  scores.  Both  samples  include 
a  large  proportion  (712  and  672)  of  applicants  who  scored  below  the  50th 
percentile.  These  proportions  are  fortuitous  for  our  purposes  because  the 
PERT  predictions  will  be  most  useful  for  those  applicants  in  the  lower  ability 
levels  who  may  not  qualify  for  all  MOS.  In  general,  comparisons  between  the 
developmental  and  validation  samples  indicate  that  there  does  not  appear  to 
be  any  major  difference  between  the  two  samples,  even  though  the  two  samples 
were  not  equally  distributed  among  the  five  Recruiting  Regions. 

Analyses 

To  examine  some  psychometric  properties  of  the  PERT,  subtest  scale  means, 
standard  devitions  and  Cronbach’s  coefficient  alpha's,  an  index  of  the  internal 
consistency  reliability,  were  computed  in  the  developmental  sample.  The 
validation  of  the  PERT  was  accomplished  by  computing  eleven  separate  regression 
equations  in  the  developmental  sample.  For  each  regression  analysis,  the 
PERT  subtest  raw  scores  served  as  the  predictor  variables  and  each  of  the 
ten  Army  Aptitude  Area  Composites  and  the  AFQT  successively  served  as  the 
criterion  variable.  Two  features  of  these  regression  analyses  need  to  be 
explicated.  First,  the  PERT  was  designed  to  predict  Army  Area  Aptitude  Composites, 
not  ASVAB  indiyidual  subtest  scores.  The  ten  Area  Aptitude  Composties  with 
corresponding  MOS  are  presented  in  Table  6.  Second,  one  PERT  Bubtest,  Coding 
Speed  (CS),  was  excluded  as  a  predictor  in  the  regression  analyses.  Preliminary 
analyses  of  the  distribution  of  CS  scores  showed  many  high  scores.  The 
distribution  indicated  that  about  half  of  the  applicants  were  apparently  allowed 
to  respond  beyond  the  5-minute  time  of  the  subtest,  thus  invalidating  the  results. 

Once  the  regression  equations  were  computed  in  the  developmental  sample, 
the  regression  coefficients  (non-standardized  beta  weights)  were  used  to  compute 
predicted  ASyAE  composite  scores  in  the  validation  sample.  The  correlation 
between  the  predicted  and  actual  ASVAB  composite  scores  in  the  validation 
sample  constituted  the  "validated"  multiple  R. 
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Table  4 


White 


Black 


Other 


The  Distribution  of  the  Developmental 
and  Validation  Samples  with  Respect  to 


Gender 

,  Race  and 

Education 

Developmental 

Validation 

Males 

191 

114 

RS  Graduate 

Female 

61 

41 

Male 

56 

28 

HS  Seniors 

Female 

21 

9 

Male 

227 

124 

GED 

Female 

30 

19 

636  (61Z) 

335  (71Z) 

Male 

103 

37 

HS  Graduate 

Female 

60 

19 

Male 

43 

5 

HS  Senior 

Female 

20 

9 

Male 

108 

43 

GED 

Female 

27 

5 

361  (34Z) 

118  (25Z) 

Male 

15 

8 

HS  Graduate 

Female 

4 

2 

Male 

5 

1 

HS  Senior 

Female 

1 

0 

Male 

22 

8 

GED 

Female 

3 

1 

50  (5Z) 

20  (4Z) 

Total 

1,047  (100Z) 

473  (100Z) 
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Table  5 


The  Distribution  of  AFQT  Scores 
on  the  Operational  ASVAB  for  the 
Developmental  and  Validation  Samples 


AFQT  Developmental  Validation 

Mental  Percentile  Sample  (n  ■  1,047)  Sample  (n  -  473) 


Category 

Score 

Percent 

Cum.  Percent 

Percent 

Cum.  Percent 

1 

93-99 

2 

100 

1 

100 

2 

65-92 

18 

98 

20 

99 

3A 

50-64 

9 

80 

12 

79 

3B 

31-49 

21 

71 

26 

67 

4A 

21-30 

14 

50 

13 

41 

4B 

16-20 

12 

36 

12 

28 

4C 

10-15 

15 

24 

9 

16 

5 

1-9 

9 

9 

7 

7 
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Results 


Takle  7  presents  the  means,  standard  deviations  and  reliability  coefficients 
for  the  PERT  subtests.  Mathematics  Knowledge  (MK)  was  the  most  difficult  of  the 
PERT  power  subtests.  Its  mean  of  3.5  is  substantially  lower  than  the  means  for  th< 
other  subtests.  The  mean  and  standard  deviation  for  Coding  Speed  (CS)  is  mislead!] 
since,  as  mentioned  previously,  many  applicants  were  apparently  allowed  to  go 
beyond  the  specified  time  limit.  The  reliabilities  of  the  power  scales  are  quite 
adequate  for  ten-item  scales. 

Results  of  the  regression  analyses  are  presented  in  Table  8.  The  multiple 
R's  in  the  developmental  sample  were  quite  high,  ranging  from  a  low  of  .73  for 
the  Clerical  (CL)  Aptitude  Area  to  a  high  of  .82  for  the  Skilled  Technician  (ST) 
and  General  Technical  (GT)  Aptitude  Areas  and  the  AFQT.  When  the  regression 
coefficients  were  validated  in  the  second  sample,  multiple  R's  decreased 
only  slightly,  with  the  exception  of  predicting  CL,  which  decreased  from  .73 
to  .63.  In  general,  however,  the  PERT  appears  to  be  a  useful  predictor  of 
Array  ASVAB  composites. 

An  examination  of  the  standardized  beta  weights  for  the  PERT  subtests 
(Table  8)  reveals  that  the  most  useful  subtests  were  Arithmetic  Reasoning  (AR) , 

Word  Knowledge  (WK) ,  Mechanical  Comprehension  (MC)  and  Auto/Shop  Information  (AS) 
followed  by  Math  Knowledge  (MK),  Paragraph  Comprehension  (PC)  and  General 
Science  (GS) .  The  PERT  subtest  Electrical  Information  (El)  did  not  achieve 
a  single  beta  weight  above  .10.  Comparisons  of  the  distribution  of  beta  weights 
across  the  PERT  subtests  in  each  regression  equation  with  the  actual  corresponding 
ASVAB  subtests  used  to  compute  ASVAB  composites  (indicated  by  underlinings  in 
Table  8),  suggests  only  a  moderate  relationship.  This  may  be  the  result  of 
sample  fluctuations  as  the  correlations  of  the  operational  ASVAB  subtests  are 
moderate  to  high  (see  Sims  and  Truss,  1980;  Table  K-3).  It  must  also  be 
noted  that  the  PERT  subtests  are  typically  based  on  half  (or  less)  of  the 
number  of  items  than  the  ASVAB  subtests  and,  consequently,  have  lower  reliabilities 
The  ASVAB  subtest  scale  reliabilities  range  from  .80  to  ,93  (Ree,  Mullins, 

Mathews  and  Massey,  in  press).  This  lower  reliability  of  the  PERT  scales  would 
account  for  some  of  the  coefficient  attenuation. 

Summary  and  Conclusion 

In  this  investigation,  a  newly  developed  test  was  validated.  The  PERT 
was  designed  to  be  used  by  recruiters  to  test  Army  applicants  in  order  to  obtain 
an  indication  of  the  applicant's  eligibility  for  enlistment  as  well  as  for 
specific  MOS.  One  of  the  requirements  for  MOS  entry  is  a  qualifying  score  on 
the  relevant  Army  Aptitude  Area  Composite.  The  PERT  responses  of  1,047  May  1981 
Army  applicants  were  used  to  develop  regression  weights  to  predict  their 
ASVAB  composite  scores.  The  prediction  equations  were  validated  in  an 
independent  sample  of  473  June  1981  Army  applicants.  The  results  from  this 
research  demonstrate  that  the  PERT  would  be  a  useful  tool  for  recruiters.  The 
final  step  remaining  before  implementation  of  the  PERT  as  an  Army  recruiting 
tool  is  to  equate  the  PERT  scales  to  the  ASVAB  scales.  Recruiters  need  to  know 
what  ASVAB  values  are  predicted  by  which  PERT  values.  This  step  will  be 
compeieted  in  the  near  future. 


Table  7 


Means,  Standard  Deviations  and  Reliabilities 
of  PERT  Subtest  Scales 


PERT  Subtest  1 

Mean 

Standard 

Deviation 

Coefficient 

Alpha 

Word  Knowledge  (WK) 

5.6 

2.6 

.74 

Arithmetic 

Reasoning  (AR) 

5.1 

2.6 

.76 

Paragraph 
Comprehension  (PC) 

5.7 

2.6 

.73 

Mechanical 
Comprehension  (MC) 

5.4 

2.3 

.66 

General  Science  (GS) 

5.2 

2.2 

.62 

Electronics 
Information  (El) 

5.0 

2.3 

.60 

Mathematics 

Knowledge  (MK) 

3.5 

2.2 

.64 

Auto /Shop 

Information  CAS) 

4.9 

2.5 

.69 

Coding  Speed  (CS)^ 

36.0 

18.1 

Note:  •'-All  PERT  subtests  have  10  items  apiece,  except  CS 
which  has  72. 


^Coefficient  Alpha  was  not  computed  for  CS  since  it 
is  a  speeded  test. 
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^ASVAB  subtests  used  to  compute  composites  are  Indicated  by  underlining. 
Standardized  Beta  Wts.  less  than  .10  are  omitted. 

■’ASVAB  subtest  NO  al.°'->  used  to  compute  operational  ASVAB  Army  compoiste. 
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The  Great  Training  Robbery 
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Abstract 

Employee  selection  ind  training  constitute  the  major  methods  available 
to  an  organization  for  Improving  the  ability  of  the  work  force.  In  large 
part  the  major  objective  for  both  activities  Is  Identical,  although  they 
achieve  this  purpose  quite  differently.  Selection  seeks  to  enhance  ability 
levels  through  a  process  of  elimination.  Selection  procedures  enable  en 
organization  to  hire  a  greater  proportion  of  hlgh-ablllty  employees  than 
would  otherwise  be  possible.  Training,  alternatively,  seeks  to  achieve  the 
objective  by  Increasing  the  ability  levels  of  the  existing  work  force. 

~4here  are  several  reasons  why  most  organizations  will  engage  In  both 
selection  and  training,  even  though  the  objective  of  the  two  Is  Identical. 
For  example.  It  Is  generally  vary  difficult  to  achieve  a  highly  skilled 
work  force  using  jvst  one  of  the  procedures.  All  OoD  elements  use  both 
procedures.  However,  by  not  going  one  step  further  and  training  employees 
vo  enable  them  to  approach  their  jobs  from  a  common  organizational  frcma  of 
reference,  the  organization  has.  In  effect,  robbed  Itself  of  valuable 
personnel  and  training  resources.  To  prevent  this,  an  Innovative  approach 
to  training  has  enabled  two  OoD  human  factors  organizations  to  Increase 
their  return  for  the  training  dollar  and  avoid  the  great  training  robbery. 


No  scene  fron  prehistoric  times  is  quite  so  vivid  as  that  of  the 
struggles  of  great  beasts  in  the  tar  pits.  In  the  mind's  eye  one  sees 
dinosaurs,  mammoths  and  sabretoothed  tigers  struggling  against  the  grip  of 
the  tar.  The  fiercer  the  struggle,  the  more  entangling  the  tar  and  no 
beast  is  so  strong  or  so  skillful,  but  that  he  ultimately  sinks. 

Training  over  the  past  years  has  been  such  a  tar  pit  and  aany  great 
and  powerful  beasts  have  thrashed  violently  in  it.  Many  training  ap¬ 
proaches  have  emerged  with  working  programs  but  few  have  net  fully  the 
goals  and  expectations.  Large  and  snail,  nassive  and  puny,  approach  after 
approach  has  become  entangled  in  the  tar.  No  one  thing  seeas  to  cause  the 
difficulty  -  any  particular  paw  can  be  pulled  away.  But  the  accumulation 
of  simultaneous  and  interacting  factors  brings  slower  and  slower  motion. 
Everyone  seems  to  be  surprised  by  the  stickiness  of  the  problem  and  it  is 
hard  to  discern  the  nature  of  it.  But  we  must  try  to  understand  the  nature 
of  it  if  we  are  to  solve  it. 

This  paper  has  three  objectives.  First,  we  will  describe  some  bisic 
dimensions  of  ability,  focusing  on  differences  in  ability  both  wlthiu  and 
between  people.  Second  will  be  a  description  of  how  an  organization  can 
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attempt  to  ms*-ch  the  abilities  of  its  employees  with  the  ability  require¬ 
ments  of  its  jobs.  Finally  attention  is  given  to  how  an  organization 
can  attempt  to  upgrade  the  ability  levels  of  its  work  force  through  its 
selection  and  training  procedures. 

We  are  all  aware  that  group  differences  in  physiological  and  psycho¬ 
logical  factors  exist.  What  is  often  not  so  well  known  is  that  differences 
between  individuals  are  generally  much  greater  than  differences  between 
groups.  Thus,  knowing  only  one's  group  affiliation  often  tells  us  little 
about  the  Individual  even  though  we  know  the  average  performance  of  the 
group. 

There  are  two  types  of  individual  differences  that  we  must  consider, 
interindividual  and  intraindividual  differences.  Interindividual  differ¬ 
ences  pertain  to  differences  between  people  -  for  example,  differences  In 
weight,  intelligence  and  vision.  On  virtually  every  physical  and  psycho¬ 
logical  dimension,  people  demonstrate  great  variability. 

Interestingly,  however,  the  patterns  of  variability  found  among 
individuals  on  different  characteristics  tends  to  be  quite  similar. 
Specifically,  the  majority  of  persons  tend  to  be  arranged  close  to  average 
on  the  characteristic,  while  relatively  few  people  tend  to  be  extremely 
high  or  low.  If  we  were  to  take  measurements  on  some  characteristic  from  a 
fairly  large  number  of  persons,  the  frequency  distribution  would  often 
appear  as  the  classical  bell -shaped  curve. 

These  generr.l  observations  should  suggest  to  us  something  of  the 
type  of  finding  to  be  expected  if  we  were  to  analyze  the  performance  of  a 
group  of  employees.  If  the  distribution  on  some  measure  of  performance 
varies  markedly  from  the  bell-shaped  curve,  caution  is  in  order.  Suppose, 
for  example,  that  almost  all  of  the  employees  are  rated  as  high  performers. 
In  that  case  we  might  suspect  that  (1)  the  organization  has  been  enormously 
successful  in  eliminating,  through  selection  and/or  training,  the  indi¬ 
vidual  differences  to  be  expected,  or  (2)  bias  has  crept  into  the  measure¬ 
ment  system.  The  latter  is  frequently  probable. 

In  contrast,  intraindividual  differences  occur  within  individuals 
and  have  to  do  with  the  relationship  between  two  or  more  characteristics. 
For  example,  if  we  know  that  an  individual  is  high  in  verbal  ability,  can 
we  also  assume  that  he  is  high  in  numerical  reasoning  ability?  In  general, 
the  answer  to  questions  of  this  sort  is  no.  Although  positive  relation¬ 
ships  do  exist  between  certain  characteristics  within  individuals,  these 
relationships  tend  to  be  low. 

Again,  the  implications  for  measuring  performance  are  fairly  obvious. 
We  would  expect  little  relationship  between  various  performance  subcom¬ 
ponents  of  an  individual  being  observed.  Someone  who  is  a  conscientious 
performer,  for  example,  may  have  only  average  human  relations  skills  and  be 
quite  unknowledgeable  about  the  task.  A  high  degree  of  correspondence 
between  measured  subcomponents  taken  on  the  same  individual  should  make  us 
suspicious  of  the  process  we  are  employing  to  get  those  measures. 


In  an  organizational  context,  It  is  of  little  value  to  view  the  abil¬ 
ities  we  have  been  discussing  in  some  absolute  sense*  We  cannot  say,  for 
example,  that  an  individual  possessing  third  grade  language  capabilities 
is  adequate  or  inadequate  without  additional  knowledge  about  the  job 
he  is  to  perform.  If  the  job  requires  only  an  ability  to  write  one's 
name  and  read  very  staple  Instructions,  the  Individual's  language  capa¬ 
bilities  may  be  adequate.  Indeed,  evidence  suggests  that  individuals  with 
capabilities  exceeding  the  requlrenents  of  the  job  may  perform  as  Inade¬ 
quately  as  those  who  do  not  possess  the  requisite  skills* 

We  think  In  terms  of  matching  a  person  to  the  job  he  is  to  perform. 
Such  a  matching  process,  in  turn,  requires  that  we  be  able  to  measure  the 
abilities  of  the  Individual  and  the  ability  requirements  of  the  job. 
Individual  abilities  to  perform  on  the  job  are  often  measured  by  perfor¬ 
mance  appraisals  designed  to  Identify  various  traits  or  characteristics  of 
workers.  However,  appraisal  systems  having  worker's  traits  as  their  major 
focus  have  generally  not  been  very  satisfactory.  This  results  partly  from 
the  difficulty  of  assessing  individual  traits  by  a  procedure  which  requires 
one  person  to  observe  another.  It  is  also  partly  a  system  fault  -  that  la, 
the  traits  measured  in  many  appraisal  systems  bear  little  obvious  relation 
to  successful  performance  of  the  task. 

An  alternative  appraisal  procedure  which  has  gained  increasing  accep¬ 
tance  focuses  on  the  behaviors  rather  than  on  the  traits  of  an  employee. 
Through  the  use  of  critical  incidents  and  related  techniques,  efforts  are 
made  to  identify  behaviors  which  are  closely  related  to  either  successful 
or  unsuccessful  task  performance.  Although  more  will  be  said  of  these 
procedures  later,  we  wish  to  point  out  here  that  behaviors  appear  easier  to 
observe  than  ability  traits  per  se<  Thus,  successful  performance  appraisal 
systems  are  likely  to  measure  behaviors  which  are  one  step  removed  from  the 
direct  measurement  of  abilities. 

In  the  preceeding  paragraphs  ability  was  defined,  the  nature  of  indi¬ 
vidual  differences  In  ability  were  identified,  and  the  need  to  match 
individuals  to  tasks  In  terms  of  the  abilities  required  was  discussed.  Now 
the  focus  turns  to  methods  an  organization  has  at  its  command  to  manipulate 
individual  ability  so  that  a  congruent  person-job  match  may  be  obtained. 
Our  discussion  will  deal  with  two  general  procedures  that  exist,  employee 
selection  and  employee  training,  or  development. 

A  major  way  in  which  organizations  attempt  to  manipulate  the  ability 
levels  of  their  work  force  is  through  employee  selection.  The  core  problem 
of  selection  involves  the  identification  of  appropriate  ability  levels 
among  job  applicants*  Appropriate  ability  is  normally  defined  In  terms  of 
the  types  of  skills  required  for  successful  performance  of  some  task. 

Thus,  selection  can  be  thought  of  as  being  concerned  with  the  Identifi¬ 
cation  (prediction)  of  successful  cask  performers  p. lor  to  their  employment 
and  can  be  achieved  by  the  use  of  one  or  more  predictors  (e.g.,  tests, 
interviews)  to  assess  the  probable  future  organizational  success  of  Job 
applicants . 
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Suppose,  for  example,  that  a  group  of  sales  aspirants  applying  to 
an  organization  are  given  a  short  general  aptitude  test  and  an  interest 
inventory  which  measures  preferences  for  various  types  of  careers.  After  a 
sufficient  time  had  elapsed,  their  ability  to  perform  the  job  would  be 
examined.  The  examination  might  take  the  form  of  a  supervisory  performance 
evaluation  of  each  employee.  This  evaluation  and  the  results  from  the 
two  tests  would  then  be  compared.  Suppose  that  this  comparison  showed 
that:  (1)  individuals  preferring  business  and  related  careers  on  the 
interest  inventory  generally  received  higher  evaluations  than  persons 
preferring  nonbusiness -related  careers,  and  (2)  there  was  no  relationship 
between  aptitude  test  scores  and  supervisory  evaluations. 

The  results  obtained  above  could  subsequently  be  used  to  assist  in 
the  selection  of  new  employees.  Specifically,  future  job  applicants  should 
he  given  the  interest  inventory,  but  not  the  aptitude  teat,  since  the 
latter  showed  no  relation  to  job  success.  Efforts  should  be  made  to  hire 
those  applicants  who  indicate  business  and  related  career  preferences  on 
the  interest  inventory.  This  hiring  procedure  should  result  in  the  employ¬ 
ment  of  a  larger  proportion  of  individuals  who  will  contribute  to  the  goals 
of  an  organization  than  would  otherwise  be  the  case. 

The  accuracy  of  the  statement  above  depends  on  three  important  assump¬ 
tions  that  require  brief  elaboration.  First,  for  any  selection  procedure 
to  be  useful  there  must  be  more  applicants  than  there  are  jobs.  This  is 
necessary  so  that  the  organization  can  choose  among  the  applicants,  ac¬ 
cepting  the  "best"  and  rejecting  the  "poorest."  In  the  illustration,  this 
would  consist  of  accepting  individuals  showing  business  and  related  career 
preferences  on  the  interest  inventory. 

Second,  we  must  assume  that  the  conditions  prevailing  when  the  predic¬ 
tors  were  initially  validated  apply  when  they  are  used  to  make  selection 
decisions.  If  they  do  not,  the  predictor -performance  relationships 
observed  in  the  validation  study  may  inadequately  describe  the  relation¬ 
ships  under  the  changed  conditions.  We  must  assume,  for  example,  that  job 
applicants  as  a  group  remain  essentially  the  same  over  time.  This  is 
likely  to  be  a  tenuous  assumption  if  the  job  market  fluctuates  markedly. 
Additionally,  we  must  assume  that  the  content  of  the  jobs  involved  remains 
stable  over  time.  This,  too,  is  a  tenuous  assumption  in  a  technologically 
innovative  organization. 

The  only  certain  test  for  the  second  assumption  is  a  continual  revali¬ 
dating  of  our  predictor  instruments  through  time.  We  may  expect  to  find 
some  shifting  about  in  the  contribution  of  various  predictors  to  successful 
selection.  It  should  also  be  obvious  from  this  discussion  that  the  utili¬ 
zation  of  a  predictor  in  our  organization  simply  because  it  has  demon¬ 
strated  validity  in  some  other  organization  is  unwarranted.  We  cannot 
safely  assume  that  the  predictt r-perforraance  relationship  will  generalize 
to  our  situation  t.c  matter  how  similar  the  tasks  and  job  applicants  appear. 

The  final  assumption  is  in  some  respects  the  most  Important.  We 
must  assume  that  the  measure  of  job  performance,  in  the  illustration  -  a 
supervisory  evaluation,  "gets  at"  what  we  regard  as  important  for  the 
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success  of  our  organization.  For  example,  do  differences  in  evaluations 
reflect  real  performance  differences,  or  do  they  instead  merely  reflect 
supervisors’  preferences  for  various  employees?  For  the  purposes  of 
this  paper,  let  it  suffice  to  say  that  employee  selection  constitutes  one 
important  personnel  activity  where  adequate  performance  appraisal  is 
crucial. 

Training,  or  development,  of  employees  is  the  second  major  method 
an  organization  employs  to  manipulate  the  ability  levels  of  its  work 
force.  Training  frequently  involves  new  employees,  but  may  also  Include 
existing  workers  whose  skills  are  deemed  insufficient  for  their  current 
job  or  for  a  job  to  which  they  are  promoted. 

Like  selection,  training  can  be  viewed  as  a  process  for  manipulating 
skill  levels.  As  such,  training  may  be  thought  of  as  involving: 

1.  Identification  of  the  skills  to  be  learned  through  training; 

2.  Identification  of  participants  to  receive  the  training; 

3.  Development  or  selection  of  procedures  which  enable  participants 
to  learn  efficiently  the  required  skills; 

4.  Appraisal  of  the  training  procedures'  effectiveness. 

Once  identifying  what  skills  are  to  be  learned,  we  can  turn  to  an 
Identification  of  the  persons  who  would  benefit  from  learning  them.  With 
new  employees,  everyone  in  the  group  may  reasonably  be  included.  This  is 
particularly  appropriate  when  the  skills  to  be  learned  are  relat ively 
unique  to  the  organization  under  consideration. 

Bass  And  Vaughan  (1966)  suggest  that  any  training  technique  be  judged 
by  how  well  it  conforms  to  the  findings  from  learning  theory.  As  such, 
they  suggest  that  an  appropriate  training  procedure  (pg.  86): 

1.  Provide  for  the  learner's  active  participation; 

2.  Provide  the  trainee  with  knowledge  of  results  about  his  attempts 
to  improve; 

3.  Promote  by  means  of  good  organization  a  meaningful  integration  of 
learning  experiences  that  the  trainee  can  transfer  from  training 
to  the  job; 

4.  Provide  some  means  for  the  trainee  to  be  reinforced  for  approp- 
priate  behavior; 

5.  Provide  for  practice  and  repetition  when  needed; 

6.  Motivate  the  trainee  to  improve  his  own  performance; 

7.  Assist  the  trainee  in  his  willingness  to  change. 

An  example  of  how  this  procedure  was  utilized  can  be  seen  from  the 
efforts  of  two  DoD  human  factors  organizations:  the  US  Army  Human  Engi¬ 
neering  Laboratory,  and  the  US  Navy's  Human  Engineering  Branch  of  the 
Pacific  Missile  Test  Center. 

These  organizations  faced  a  common  personnel  problem.  Both  groups 
hired  people  with  various  educational  backgrounds.  Not  only  did  the 
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educational  accomplishments  of  the  employees  range  from  college  to  the 
post-graduate  level,  but  the  expertise  ranged  across  various  fields  such  as 
psychology,  anthropometry,  engineering  and  computer  science.  Like  most 
organizations,  these  DoD  components  engaged  in  both  selection  and  training 
to  meet  their  personnel  needs. 

Because  the  Army  and  Navy  are  increasingly  pleased  with  the  success 
of  self-paced,  interactive  instruction  and  programmed  instruction  texts, 
they  collaborated  on  a  computer-based  training  course  that  combined  modern 
training  procedures  with  an  approach  that  allows  the  accomplishment  of 
the  seven  steps  mentioned  above.  Additionally,  by  recognizing  and  ad¬ 
dressing  the  need  to  provide  their  people  with  a  common  technical  frame  of 
reference  among  human  factors  specialists  -  wherever  they  may  be  em¬ 
ployed  -  these  organizations  have  increased  their  return  for  the  training 
dollar.  By  using  this  additional  step,  they  have  avoided  the  great 
training  robbery  that  so  often  inhibits  the  potential  for  the  maximum 
payoff  of  valuable  personnel  and  training  resources. 

In  conclusion,  the  mo3t  suitable  evaluation  from  an  organization's 
point  of  view  will  be  direct  evidence  about  employee  performance.  The 
basic  question  is,  has  the  performance  of  the  participants  benefited  from 
the  training  program.  Thus  in  training,  as  in  selection,  the  ultimate 
value  can  be  determined  only  after  employees'  contributions  to  organi¬ 
zational  objectives  can  be  properly  assessed  and  success  can  be  realized 
when  innovative  approaches  are  used  to  Increase  the  efficiency  of  good 
training  and  selection  procedures. 
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Abstract 

Recent  research  by  Maier  and  Grafton  (1981)  has  demonstrated  the  effective¬ 
ness  of  aptitude  composites  from  the  ASVAB  8,  9;  and  10  in  predicting  training 
success  and  job  proficiency.  The  present  study  was  concerned  with  assessing  the 
ability  of  the  ASVAB  composite  scores  to  discriminate  individuals  in  Infantry 
training  recommended  for  early  separation  under  the  Trainee  Discharge  Program 
(TDP)  from  successful  trainees.  The  composite  scores  of  87  TDPs  and  87  non-TDPB 
were  compared.  A  discriminant  analysis  revealed  that  approximately  692  of  the 
TDPs  and  69%  of  the  successful  trainees  could  be  correctly  classified  using  a 
weighted  combination  of  three  ASVAB  composites.  The  classification  functions 
derived  from  the  initial  sample  were  applied  to  a  second  sample  (N-80)  and  showed 
that  approximately  70%  of  both  the  TDPs  and  successful  recruits  could  be  correctly 
classified.  When  these  classification  functions  were  applied  to  a  more  realistic 
sample  (N=419)  and  success-failure  base  rate  was  taken  into  account,  it  was  found 
that  97%  of  the  successful  recruits  could  be  correctly  classified,  but  only  3% 
of  the  TDPs.  Reasons  for  the  ASVAB* s  inability  to  accurately  identify  TDPs  and 
directions  for  future  research  are  discussed.  ^ 

Introduction 

Since  the  inception  of  the  all  volunteer  Army  In  1973,  there  has  been  an 
increasing  focus  on  personnel  attrition.  Recent  reports  (e.g.  Frank  &  Erwin, 

1978,  Youngblood,  Laughlin,  Mobley,  &  Meglino,  1980)  Indicate  that  approximately 
ten  percent  of  all  Incoming  enlistees  do  not  make  it  through  Basic  Combat  Train¬ 
ing  (BCT)  and.  One  Station  Unit  Training  (OSUT).  Given  the  size  of  the  total 
incoming  enlistee  population,  such  attrition  rates  represent  a  sizable  cost  to 
the  Army  In  terms  of  both  money  and  wasted  training  time. 

Currently,  recruits  who  are  having  problons  (non-medical  in  nature)  in 
OSUT  and  who  have  not  responded  to  various  motivational  strategies  (e.g.  counsel¬ 
ing)  are  recommended  for  early  discharge  under  the  Trainee  Discharge  Program  (TDP). 
The  Trainee  Discharge  Program  was  designed  by  the  Department  of  the  Army  to  permit 
the  rapid  separation  of  those  individuals  deemed  unfit  for  military  service.  The 
existence  of  such  a  program  is  highly  desirable  from  the  standpoint  of  the  unit's 
morale  as  well  as  one  of  training  efficiency  (i.e.  the  sooner  undesirables  are 
identified  and  removed,  the  less  expense  there  is  to  the  Army). 

The  rationale  behind  the  Trainee  Discharge  Program  is  sound  but  can  be 
quite  costly.  For  example,  during  FY  1930  at  Ft.  Benning  there  were  approxi¬ 
mately  1400  TDPs;  most  of  these  individuals  were  identified  by  about  the  sixth 
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week,  of  training.  When  one  considers  the  cost  of  putting  one  enlistee  through 
OSUT  at  Ft.  Benning  during  FY  1980  was  $23,599.00,  it  becomes  clear  that  a 
large  amount  of  money  is  being  lost  by  the  Army  on  these  unsuccessful  recruits. 

If,  however,  a  significant  number  of  these  TDPs  could  be  reliably  identi¬ 
fied  and  screened  out  before  they  begin  OSUT,  a  substantial  amount  of  time  and 
money  could  be  saved. 

Several  strategies  have  been  suggested  for  identifying  early  attrites 
e.g.  autobiographical  data  (Frank  &  Erwin,  1978);  assessment  of  recruits  expected 
satisfaction,  intentions  toward  completing  their  enlistment,  and  their  attraction 
to  the  role  of  being  a  soldier  (Youngblood,  Laughlin,  Mobley,  &  Meglino,  1980). 
Early  reports  on  these  efforts  are  quite  promising  but  they  are  still  in  the 
developmental  stage  and/or  have  yet  to  be  implemented  on  a  large  scale. 

Another  potential  selection  device  is  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVAB) .  Although  not  designed  specifically  for  this  purpose, 
it  does  have  the  advantage  of  already  being  used  on  a  large  scale  by  the 
Department  of  Defense,  and  if  found  to  be  successful  at  identifying  early 
attrites  would  represent  a  relatively  cost  free  selection  instrument. 

The  ASVAB  was  designed  by  the  Department  of  Detente  to  assess  mental 
qualification  for  enlistment  and  to  classify  accession  into  skill  training 
programs.  The  ASVAB  provides  an  Armed  Forces  Qualification  score,  which  is 
already  used  to  screen  out  applicants  unqualified  (ability  wise)  for  enlistment. 
The  most  recent  ASVAB,  form(8/9/10) ,  contains  ten  subtests  and  ten  composites 
(i.e.  Combat,  Field  Artillery,  Electronics  Repair,  Operators  and  Food,  Sur¬ 
veillance  and  Communications,  Mechanical  Maintenance,  General  Maintenance, 
Clerical,  Skilled  Technical,  and  General  Technical),  Each  composite  with 
the  exception  of  General  Technical  is  composed  of  from  three  to  five  subtests 
which  were  found  to  be  the  most  valid  predictors  of  success  in  the  job  training 
programs.  The  aptitude  composites  are  used  to  determine  the  eligibility  for 
assignment  to  job  specialties.  (See  Maier  &  Grafton,  (1981)  for  a  more  complete 
description  of  the  subtests  and  the  composites  comprising  the  ASVAB). 

The  primary  focus  of  this  investigation  however,  was  to  assess  the  utility 
of  ASVAB  as  a  predictor  of  early  attrition  from  the  U.S.  Army. 

Method 


Subj  ects 

Scores  on  the  ASVAB  Composites,  Ethnic  Status,  Performance-Oriented 
Infantry  Qualification  Test  (POIQT)  results,  and  graduation  status  (whether  or 
not  a  recruit  was  a  TDP)  were  obtained  on  1094  men  who  were  members  of  OSUT 
companies  which  began  training  between  1  September  and  31  December  1980. 

Out  of  these  1094  recruits  170  were  identified  as  TDPs. 


Results  and  Discussion 


During  the  time  period  sampled  from,  two  different  versions  of  the 
ASVAB  were  given  (7b  &  8,  9,  10).  Fruchter  and  Ree  (1977)  have  suggested 
that  these  are  equivalent  forms,  and  will  be  considered  as  such  from  a 
statistical  perspective.  A  stepwise  discriminant  analysis  (Nie,  Bull, 

Jenkins,  S teln.br ran er,  &  Bent,  1975)  based  on  87  TDPs  and  87  successful  ^ 
trainees  (determined  from  passing  POIQT  scores)  was  performed  using  nine 
ASVAB  components  as  the  discriminating  variables.  While  the  successful 
recruits  scored  significantly  higher  on  all  of  these  components,  the  results 
of  the  stepwise  discriminant  analysis  indicated  that  the  two  groups  could  be 
optimally  discriminated  using  three  components:  combat  arms,  clerical 
knowledge,  and  electronics  information. 

The  chi-square  associated  with  the  resulting  lambda  of  .8174  was  highly 
significant,  x2(3)  «=  34.368,  £  <  .00001,  indicating  that  the  three  components 
could  differentiate  between  the  groups.  It  should  be  noted,  however,  that 
the  large  lambda  signifies  that  the  difference  between  the  group  centroids 
(.469  for  the  successful  trainees  and  -.469  for  the  TDPs)  was  rather  small. 

The  single  discriminant  function  that  was  derived  was  also  significant 
(by  virtue  of  the  significant  chi-square)  and  took  the  following  form: 

Discriminant  «  -12.1111  +  .03387 (Combat)  +  .06076(Electronics)  -  .2705(Clerical) 
Score 


Since  the  major  Interest  was  in  the  ASVAB* a  ability  to  identify  TDPs, 
a  separate  "classification  function"  for  each  group  was  derived  based  on  the 
three  aforementioned  discriminating  variables  (composites).  The  functions 
for  each  group  were  as  follows: 

Successful  Trainees 


Score  =  -80.660  +  .5 6093 (Combat)  +  . 63416(Electronics)  -  . 2705(Clerical) 

TDPs 

Score  *  -69.279  +  . 52 909( Combat)  +  .577 04 (Electronics)  +  .25076(Clerical) 

Each  function  was  applied  to  both  groups  of  enlistees,  producing  two  class¬ 
ification  scores  per  person.  A  case  was  classified  into  the  group  whose  function 
yielded  the  highest  score,  since  the  scores  have  the  property  that  the  case  re¬ 
sembles  most  closely  that  group  on  which  it  has  the  highest  score  (Klecka,  1980). 

When  these  functions  were  applied  to  the  derivation  sample,  69.54%  of  all 
the  recruits  were  correctly  classified.  Cell  breakdowns  showed  that  approximately 
69%  of  the  TDPs  and  70.1%  of  the  successful  trainees  were  correctly  identified. 


The  General  Technical  (GT)  score  was  omitted  from  the  analysis  due  to 
its  redundancy  with  several  of  the  remaining  components. 
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The  same  functions  were  then  applied  to  a  holdout  (cross-validation) 
sample  of  TDPs  (N=40)  and  successful  recruits  (N=40)  and  resulted  in  no  loss 
of  accuracy.  In  fact,  71.25%  of  all  recruits  in  this  sample  were  correctly 
classified.  Cell  breakdowns  showed  that  70%  of  the  TDPs  and  72.5%  of  the 
successful  trainees  were  correctly  identified. 

In  order  to  assess  the  effectiveness  of  these  functions  in  a  more 
realistic  situation,  a  sample  was  created  in  which  ten  percent  of  the  recruits 
were  TDPs  (N=42)  and  the  remaining  ninety  percent  were  successful  trainees 
(N=377).  The  classification  functions  were  then  applied  to  this  second  cross- 
validation  sample;  this  time  however,  the  functions  were  adjusted  to  take  into 
account  the  prior  probabilities  of  belonging  to  the  successful  trainee  group 
(.90)  or  the  TDP  group  (.10).  The  results  of  this  validation  effort  indicated 
that  88%  of  all  recruits  were  correctly  classified.  However,  the  individual 
cell  breakdowns  showed  that  97%  of  the  successful  recruits  could  be  correctly 
classified  but  only  3%  of  the  TDPs  (See  Table  1).  These  results  strongly 
suggest  that  the  predictive  utility  of  the  ASVAB  with  respect  to  identifying 
early  attrites  is  negligible. 

Inspection  of  the  distribution  of  discriminant  scores  provides  some  insight 
into  the  ASVAB' s  inability  to  identify  early  attrites.  Figure  1  shows  the  dis¬ 
tribution  of  discriminant  scores  for  both  the  TDPs  and  successful  trainees.  It 
can  be  seen  that  the  distributions  of  both  groups  overlap  with  each  other  to  a 
considerable  extent.  The  overlapping  distributions  and  a  90%  base  rate  of 
success  in  OSUT  may  explain  in  large  part  the  inability  of  the  ASVAB  to 
discriminate  the  TDPs  from  the  successful  trainees. 

Further  analysis  of  the  two  distributions  raises  some  other  noteworthy 
points.  Assuming  that  these  sample  distributions  are  representative  of  the 
actual  population  distributions  for  TDPs  and  successful  trainees,  one  thing 
that  stands  out  is  that  the  population  of  TDPs  appears  more  homogeneous  and 
tends  to  cluster  at  the  lower  end  of  the  continuum.  There  are,  however,  some 
n'?;able  exceptions.  Approximately  22%  of  the  TDPs  fall  in  the  mid  to  upper  end 
of  the  continuum  of  discriminate  scores.  Also,  of  interest  is  that  some  of  the 
successful  recruits  had  scores  at  the  extreme  lower  end  of  the  continuum. 

The  cluster  of  TDPs  at  the  lower  end  suggests  that  these  individuals  may 
be  deficient  in  some  core  ability  like  reading  comprehension  and  this  alone 
could  account  for  numerous  problems  in  OSUT. 

Nevertheless,  it  is  also  clear  from  Figure  1  that  some  people  with  these 
"deficiencies"  do,  in  fact,  make  it  through  training.  Several  possibilities 
may  account  for  this:  the  unreliability  of  the  instrument,  high  levels  of 
motivation  that  would  overcome  any  learning  deficit  (i.e.  halo  effect),  or 
certain  dispositional  characteristics  (e.g.  compliance)  that  would  allow 
someone  to  go  through  training  with  a  minimal  amount  of  conflict.  The  poss¬ 
ibility  also  exists  that  incompetent  people  are  not  being  terminated  from  the 
Army  during  OSUT  due  to  differential  criteria  for  evaluating  competence  at  the 
company  level. 
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TABLE  I 


COMPARISON  OF  ACTUAL  GROUP  MEMBERSHIP  WITH 
PREDICTED  GROUP  MEMBERSHIP  FOR  TDPs  AND 
SUCCESSFUL  TRAINEES* 


PREDICTED  GROUP  MEMBERSHIP 


SUCCESSFUL 

i 

1 

377 

363  (97.6%) 

9  (2.4%) 

TDP 

42 

41  (97.6%) 

1  (2.4%) 

PERCENTAGE  OF  RECRUITS  CORRECTLY  CLASSIFIED:  88% 
•PRIOR  PROBABILITIES  SET  AT  .90CSUCCESS) 

AND  .10  (FAIL) 


f 


i 

i 

i 


8.0  -*.•  -t.O  -1.1  -1.#  -.8  «  .8  1.*  1.8  8.8  8.8  8.0 

Discriminant  Scores 


The  presence  of  TDPs  at  the  mid  to  upper  end  of  the  distribution  would 
seem  to  Indicate  though  that  a  single  core  ability  like  reading  comprehension 
(while  Important)  Is  net  the  only  reason  why  someone  would  be  unsuccessful. 

It  should  come  as  no  surprise  that  people  might  be  unsuccessful  for  any 

number  of  reasons,  seme  which  are  unrelated.  There  is  no  reason  to  expect,  for 

example,  that  problems  with  authority  figures  should  be  related  to  reeding  skills. 

Conclusion 

The  results  of  this  study  suggest  that  while  the  ASVAB  may  he  a  reliable 
predictor  of  job  proficiency  (Maier  &  Grafton,  1981)  it  does  not  appear  to  be 
a  very  effective  predictor  of  attrition  in  basic  training.  It  should  be  re¬ 
iterated  again,  though,  that  the  AbVAB  was  not  designed  with  this  purpose 
specifically  In  mind. 

Nevertheless,  this  research  effort  does  suggest  a  course  for  future 
research  in  this  area.  Although  some  aptitude/ability  may  be  necessary  to 
get  through  basic  training,  other  factors  which  may  be  even  more  critical  In 
determining  trainee  success  are  the  motivation.  Interpersonal  style  and 
background  of  the  recruits. 

Thus,  future  efforts  at  identifying  high  risk  recruits  might  be  more 
profitably  directed  toward  the  following  strategies  (or  combinations): 
intention  and  attitudlnal  measures  (e.g.  Youngblood,  et  al,  1980);  auto¬ 
biographical  data  (e.g.  Frank  &  Ervin,  1978);  and  realistic  job  previews 
(e.g.  Wanous,  1978). 

Given  the  expenditures  associated  with  those  recruits  who  are  separated 
from  the  Army  prior  to  the  completion  of  basic  training,  strategies  for 
identifying  those  Individuals  before  they  enter  basic  training  should  be 
vigorously  pursued  and  investigated. 
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Applying  Linguistic  Theory  to  Job-Related  Verbal  Teats 


Ongoing  research  at  OPM  has  indicated  the  applicability  of  lin- 
guistic  theory  to  the  development  of  tests  containing  a  verbal  com¬ 
ponent.  The  ultimate  aim  of  this  research  is  a  practical  set  of 
construction  roles  that  vlll  allow  item  writers  to  control  the  reading 
level  of  a  test  so  that  it  (1)  is  job-related  and  (2)  does  not  impede, 
to  any  significant  level,  performance  on  item  types  which  are  intended 
to  test  a  non-verbal  ability. 

This  paper  describes  a  technique  for  measuring  and  controlling 
verbal  difficulty  based  on  applied  linguistic  theory.  The  technique 
has  been  used  successfully  to  control  the  effect  of  verbal  complexity 
on  performance  on  a  test  of  deductive  reasoning.  It  is  currently  being 
evaluated  to  determine  its  usefulness  in  identifying  the  reading  level 
of  a  job  and  in  developing  a  job-related  reading  test. 

The  technique  is  explained,  with  examples,  and  its  application 
to  job-related  verbal  tests  is  discussed. 
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Applying  Linguistic  Theory  to  Job-related  Verbal  Tests 
Alicia  Kenaston 

Personnel  Research  and  Development  Center 
United  States  Office  of  Personnel  Manageaent,  Washington,  D.C. 

Ongoing  research  at  the  Personnel  Research  and  Development  Center 
of  the  Office  of  Personnel  Management  focusses  on  developing  a  set  of 
rules  based  on  applied  linguistic  theory  for  the  construction  of  verbal 
items.  The  ultimate  ala  of  this  research  is  a  practical  set  of  procedures 
that  will  allow  item  writers  to  control  the  verbal  difficulty  and  make-up 
of  an  item  in  such  a  way  that  the  verbal  component  of  a  test  is  both  appro¬ 
priate  and  job-related.  The  verbal  component  is  appropriate  if  it  does 
not  impede,  to  any  significant  level,  performance  on  item  types  which 
are  intended  to  test  a  non-verbal  ability,  e.g.,  arithmetic  reasoning. 

A  job-related  verbal  component  conforms  to  the  level  of  linguistic  ability 
required  for  successful  performance  on  the  job  for  which  the  applicant  is 
being  tested. 

The  initial  thrust  of  this  research  has  been  to  develop  a  method  to 
control  and  quantify  syntactic  complexity  at  the  sentence  level.  The 
research  departs  from  the  premise  that  the  sentence  is  "the  ordinary  vehicle 
of  linguistic  reference"  (Whorf,  quoted  in  Bever,  1972).  This  implies 
that  the  receiver  of  a  message,  whether  written  or  spoken,  does  not  only 
decode  the  various  lexical  units  with  which  he  or  she  is  confronted  but 
must  also  recognise  and  decode  the  syntactic  and  semantic  relationships 
which  intertwine  the  lexical  units  in  order  to  understand  the  message. 
Studies  of  child  language  acquisitio  (for  example,  Bloom,  1970)  have 
suggested  that  even  the  one-word  utterances  of  children  just  learning  to 
speak  are  actually  leniences,  loaded  with  syntactic  and  semantic  information 
that  the  child  has  not  yet  learned  to  encode  linguistically.  Gesture  and 
situational  cues  help  to  decipher  these  utterances.  So,  for  example  the 
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utterance  "mommy "  has  an  almost  unlimited  number  of  potential  meanings: 

“I  see  aonmy ; "  "Mommy  eowu  here;”  "I  want  mommy;"  "This  thing  belongs  to 
mommy;"  etc. 

The  psycholingulstic  theory  upon  which  the  method  for  measuring  language 
complexity  described  here  is  based  proceeds  from  the  premise  that  the 
sentence  is  the  basic  unit  for  encoding  and  decoding  a  message.  However, 
since  the  capacity  of  the  short  term  memory  is  limited,  a  "pre-processor” 
of  linguistic  information  has  been  hypothesized.  That  is,  a  sentence  which 
is  too  long  to  be  decoded  in  its  entirety  must  be  broken  down  into  units 
which  are  individually  deciphered  and  stored.  The  investigations  of  Bever 
(1972)  enumerate  three  principal  features  of  the  pre-processing  mechanism: 

(1)  The  clause  is  the  primary  perceptual  unit;  (2)  Within  each  clause, 
semantic  relations  between  major  phrases  are  assigned;  and  (3)  The  clause 
is  recoded  into  relatively  abstract  form  to  "make  room"  for  the  next  clause. 
Bever 's  findings  suggest,  in  part,  that  the  number  of  clauses  per  sentence 
is  a  factor  of  sentence  difficulty. 

The  Clause  Analysis  Technique 

Cook  (1975)  has  developed  a  clause  analysis  technique  for  the  measure¬ 
ment  of  style  complexity.  This  method  has  the  advantages  of  being  fairly 
easy  to  use  while  yielding  detailed  syntactic  information,  'it  can  easily 
be  employed  by  anyone  with  a  strong  grasp  of  traditional  grammar  and  a  mind 
open  to  a  few  insights  from  linguistic  theory. 

Independent  of  Bever* s  work,  Cook  designated  the  clause,  the  basic 
unit  of  analysis.  The  technique  involves  three  levels  of  Clause  structure: 
(1)  the  basic  unit,  (2)  the  information  block,  which  consists  of  a  main 
clause  and  the  clause  group  clustered  around  it,  and  (3)  the  sentence. 
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whieh  may  consist  of  one,  or,  in  the  case  of  compound  sentences,  more 
information  blocks. 

There  are  two  major  steps  in  the  clause  analysis  technique,  the  first 
to  identify  the  clauses  and  the  second  to  compute  the  style  indices,  which 
are  based  on  the  three  levels  of  clause  structure  described  above.  A  clause 
contains  a  verb  phrase,  usually  only  one,  although  conjoined  verb  phrases  are 
possible.  The  first  rule  for  identifying  clauses  is  thus  to  identify 
the  verb  phrases.  Due  to  the  application  of  transformational  rules  to  the 
deep  structure  of  a  sentence,  verbs  are  sometimes  deleted  from  the  surface 
representation  of  a  clause.  These  verbless  clauses  include  comparisons, 
postposed  locative  and  adjective  phrases,  appositives,  and  so  on. 

Identifying  Clauses.  As  stated  earlier,  the  identification  of  clauses 
is  essentially  the  identification  of  verb  phrases.  The  following  sentences 
can  easily  be  analyzed  into  their  clause  components: 

(1)  They  attended  the  conference  that  was  sponsored  by  MTA. 

(2)  The  investigator  intended  to  analyze  linguistic  complexity. 

(3)  They  arrived  for  an  8:00  am  session  drinking  coffee. 

(4)  If  you  don't  understand  this,  you  can  ask  questions. 

Each  of  the  examples  above  contains  two  clauses.  In  the  analysis  of  clauses, 
care  must  be  taken  not  to  separate  auxiliary  verbs  from  the  lexical  verbs 
to  which  they  belong.  Thus,  in  sentence  1  "was",  the  passive  marker,  is 
included  as  part  of  the  verb  phrase  "was  sponsored;"  in  sentence  4  "don't" 
and  "can”  are  included  as  part  of  the  verb  phrases  "don't  understand"  and 
"can  ask".  The  auxiliary  verbs  are:  (a)  auxiliary  "be"  in  the  progressive 
tenses  and  the  passive,  (b)  modal  verbs  such  as  "will,"  "can,"  "must," 
"should,"  etc.,  (c)  the  perfect  marker  "have",  and  (d)  auxiliary  "do," 

A  verb  phrase  may  have  more  than  one  auxiliary:  "I  didn't  see  the 
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manuscript,  but  it  night  have  been  being  typed." 

Verbless  clauses  are  only  a  little  more  difficult  to  uncover.  The 
following  examples  do  not  exhaust  the  possible  types  of  verbless  phrases: 

(a)  comparative 

(5)  This  year's  conference  is  bigger  than  last  year's  (was). 

(6)  This  workshop  lasts  as  long  as  that  one  (lasts). 

(b)  postposed  adjective  and  locative  phrases 

(7)  They  attended  the  conference  (that  was)  sponsored  by  MTA. 

(8)  The  papers  (which  are)  under  the  heading  'Test  Development1 

look  interesting. 

(c)  appositives 

(9)  The  headquarters  of  MTA,  (which  is)  the  sponsor  of  this 

conference ,  are  located  in  Virginia. 


Computing  the  Style  Indices.  Following  the  conventions  of  traditional 
grammar,  one  of  the  verb  phrases  is  designated  the  main  clause.  In  sentence 
1,  the  main  clause  is  “They  attended  the  conference;”  in  sentence  4  the 
main  clause  is  "you  can  ask  questions.”  Generally,  the  main  clause  contains 
the  subject  of  the  sentence  and  the  verb  phrase  closest  to  it. 

After  identifying  the  clauses,  the  analyst  rewrites  them,  a  clause  to 
a  line,  indicating  the  main  clause  with  the  letter  A,  clauses  subordinate 
to  the  main  clause  with  the  letter  B,  clauses  subordinate  to  a  B  clause  with 
the  letter  C,  etc.  This  process  is  called  reduction.  The  sample  sentences 
may  be  reduced  as  follows: 

(1)  #A  They  attended  the  conference 

B  that  was  sponsored  by  MTA 

(2)  #A  The  investigator  intended 

B  to  analyze  linguistic  complexity 

(3)  <*A  They  arrived  for  an  8:00  am  session 

B  drinking  coffee 

(4)  #B  If  you  don't  understand  this 

A  you  can  ask  questions 
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The  symbol  "9"  is  used  to  indicate  the  beginning  of  a  sentence. 

Compound  sentences  may  also  be  reduced: 

(10)  #A  The  investigator  presented  a  paper 
+A  and  the  participants  asked  questions 

The  symbol  "+”  indicates  the  boundary  of  an  information  block. 

When  the  clauses  have  been  Identified  and  reduced,  main  and  subordinate 
clauses  indicated  by  the  appropriate  letter,  the  analyst  may  begin  the 
second  step,  the  computation  of  language  complexity.  Cook  describes  three 
indices  of  complexity:  (1)  average  sentence  length  (ASL)  —  the  average  number 
of  clauses  per  sentence,  (2)  average  block  length  (ABL)  —  the  average  number 
of  clauses  per  information  block,  and  (3)  average  clause  depth  (ACD)  —  the 
average  depth  of  embedded  clauses.  The  "depth"  of  an  embedded  clause  is 
determined  by  its  distance  from  the  main  clause;  that  is,  a  clause  desig¬ 
nated  as  a  C  clause  in  the  process  of  reduction  is  more  deeply  embedded 
(farther  from  the  A  clause)  than  a  B  clause.  Clauses  have  been  assigned 
arbitrarily  the  following  values:  A=1,B“2,  C-3,  etc.  The  Appendix 
presents  the  analysis  of  a  sample  text. 

Applications 

The  clause  analysis  method  has  been  used  in  PRDC  in  three  different 
types  of  tests:  syllogistic  reasoning,  arithmetic  reasoning,  and  reading 
comprehension. 

In  the  syllogistic  reasoning  experiment,  the  technique  permitted  the 
investigator  to  systematically  control  the  syntactic  configuration  of  four 
series  of  items.  Two  of  the  series,  labeled  "simple  syntax",  consisted  of 
sentences  containing  no  embedded  clauses  and  two  series,  labeled  "complex 

\ 

syntax"  consisted  of  sentences  containing  various  numbers  of  clauses.  The 
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syntactic  configuration  of  all  of  the  "simple  syntax"  items  was  identical. 
Individual  items  in  the  two  series  labeled  "complex  syntax"  also  had  iden¬ 
tical  syntactic  configurations  (e.g.,  item  1  had  the  same  number  and  depth 
of  clauses  in  each  version).  The  complex  syntax  items  were  constructed  by 
matching  the  type  of  clauses  (A,  B,  C,  etc.)  in  each  version  of  an  item. 

With  the  number  and  depth  of  the  clauses  held  constant,  the  investigator 
is  able  to  study  the  effects  of  other  linguistic  variables  on  test  perfor¬ 
mance  —  in  thin  case  the  effects  of  vocabulary  difficulty. 

A  method  of  reducing  the  syntactic  complexity  of  a  test  of  arithmetic 
reasoning  was  developed  using  clause  analysis  as  a  base.  The  purpose  of 
the  experiment  was  to  attempt  to  minimize  the  verbal  load  of  the  test  and, 
thus ,  minimize  discrimination  against  deaf  applicants  who  generally  do  not 
perform  as  well  as  hearing  subjects  on  tests  of  arithmetic  reasoning  because 
of  the  high  verbal  content  of  this  item  type  (Stunkel,  1957,  for  example). 
Preliminary  results  show  fewer  omissions  on  the  modified  items,  indicating 
that  the  deaf  subjects  are  spending  less  time  trying  to  decode  the  verbal 
component  of  the  items. 

Currently  we  are  carrying  out  a  study  to  compare  the  efficiency  and 
accuracy  of  clause  analysis  in  describing  the  difficulty  of  reading  materials 
with  that  of  the  Flesch  Reading  Ease  formula.  Correlations  so  far  are 
in  the  high  sixties.  Clause  analysis  gives  the  item  writer  a  coherent  frame¬ 
work  of  linguistic  structure  In  which  to  work.  For  example,  an  item  writer 
trying  to  construct  items  within  the  reading  level  of  a  specific  job  would 
know  to  avoid  (or  maximize)  the  use  of  deeply  embedded  clauses.  Clause 
analysis  provides  a  linguistic  guide  to  be  used  during  the  item  writing 
process  itself,  as  opposed  to  the  Flesch,  which  serves  principally  as  a 
post-facto  index  of  reading  level. 
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Because  It 
a  valuable 
tests  with 


Is  easily  and  quickly  calculated,  clause  analysis  would  be 
tool  in  determining  the  reading  level  of  jobs  and  in  developing 
a  demonstrably  job-related  verbal  component. 


References 


Bever,  Thomas  G.  Perception,  thought  and  language.  In  Roy  0.  Freedle 

&  John  B.  Carroll  (Eds.),  Language  comprehension  and  the  acquisl- 
tion  of  knowledge.  Washington,  D.C.:  V.H.  Winston  &  Sons,  Inc., 
1972. 

Bloom,  L.M.  Language  development:  Form  and  function  In  emerging  grammars. 
Cambridge,  Mass.:  M.l.T.  Press,  1970. 

Cook,  Walter  A.,  S.J.  Stylistics:  Measuring  style  complexity. 

Language  and  Linguistics  Working  Papers,  1975,  1^,  106-120. 

Flesch,  R.  A  new  readability  yardstick.  Journal  of  Applied  Psychology, 
1948,  32,  221-233. 

Stunkel,  E.R.  The  performance  of  deaf  and  hearing  college  students  on 
verbal  and  nonverbal  intelligence  tests.  American  Annals  of  the 
Deaf,  1957,  102,  342-355. 


650 


APPENDIX 


Clause  Analysis  of  a  Sample  Text 


1.  # A  Chomsky's  position  not  only  is  unique  within  linguistics  at  the 

present  time 

2.  B  but  is  probably  unprecedented  in  the  whole  history  of  the  subject. 

3.  #A  His  first  book  (4)  (5)  revolutionized  the  scientific  study  of  language 

4.  B  published  in  1957 

5.  B  short  and  relativley  non-technical  though  it  was 

6.  +A  and  now  he  speaks  with  unrivaled  authority  on  all  aspects  of  grammatical 

theory. 

7.  #A  That  is  not 

8.  B  to  say,  of  course, 

9.  C  that  all  linguists,  or  even  a  majority  of  them,  have  accepted  the 

theory  of  transformational  grammar 

10.  D  that  Chomsky  put  forward  some  thirteen  years  ago  in  Syntactic 

Structures.* 


The  symbol  "#"  indicates  sentence  boundaries. 
The  symbol  ”+“  indicates  block  boundaries. 


Computation  of  Indices 

1.  Average  Sentence  Length  -  number  of  clauses  -  number  of  sentences 

10-3-3.33 

2.  Average  Block  Length  -  number  of  clauses  -  number  of  main  (A)  clauses 

10  -  4  -  2.50 

3.  Average  Clause  Depth  -  total  value  of  clauses  -  number  of  clauses 

19  -  10  -  1.90 


*  From  Lyons,  John.  Noam  Chomsky.  New  York:  The  Viking  Press,  1970 
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> The  purpose  of  this  research  was  to  investigate  how  the  important 
area  of  unit  effectiveness  is  assessed  in  the  Army.  Data  was  collected 
from  a  sample  of  senior  Army  commanders  regarding  their  perceptions  of 
existing  standard  Army  measures  of  battalion  effectiveness.  These 
measures  naturally  classify  into  three  groups:  (1)  command  indicators 
(e^g.,  AWOL  rates.  Articles  15);  (2)  readiness  measures  (e.g.,  equip¬ 
ment  rated  ready,  annual  general  inspections);  and  (3)  the  personal 
judgments  of  subordinate  Army  leaders.  Senior  Army  leaders  chose  those 
measures  from  all  of  these  groups  which  provided  for  them  the  most 
accurate  picture  of  a  battalion's  effectiveness.  It  was  found  that 
military  leaders  not  only  have  predetermined  attitudes  toward  all 
existing  effectiveness  measures,  but  that  even  when  this  rater  bias  is 
controlled,  there  exists  a  definite  preference  for  specific  groups  of 
measures.  The  command  indicators  were  found  to  have  the  least  per¬ 
ceived  validity  and  utility  for  Army  leaders,  while  personal  judgments 
and  readiness  measures  were  rated  significantly  higher  for  their 
credibility  in  assessing  battalion  effectiveness.^ 
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•  COMMANDERS'  ASSESSMENT  OF  UNIT  EFFECTIVENESS  MEASURES 
Susan  E.  Kerner-Hoeg  and  Francis  E.  O’Mara 
INTRODUCTION 

A  critical  facet  of  successful  c Derations  by  any  organization  is  the 
continual  monitoring  of  organizational  performance.  The  information  gathered 
through  such  activity  is  of  importance  for  the  development  of  realistic  goals, 
for  planning  optimal  strategies  for  achieving  these  goals,  and  for  the  iden¬ 
tification  and  remediation  of  organizational  deficiencies.  Nowhere  is  organ¬ 
izational  effectiveness  measurement  more  vital  than  in  the  military,  given 
the  potentially  disastrous  consequences  of  misjudging  national  defense  capa¬ 
bilities.  Further,  the  estimate  of  aggregate  military  potential  has  broader 
national  implications  inasmuch  as  such  estimates  influence  decisions  in  other 
areas  of  national  concern,  such  as  development  of  Federal  budget  priorities 
and  the  formulation  of  U.S.  foreign  policy.  Thus,  the  development  of  means 
to  accurately  assess  the  strengths  and  deficiencies  of  military  units  is  a 
vital  concern  both  for  the  military  as  well  as  for  our  larger  society. 

Given  the  importance  of  measuring  unit  effectiveness,  it  is  not  sur¬ 
prising  that  the  Army  has  traditionally  monitored  quantified  measures  of 
many  facets  of  unit  operations  at  all  echelons.  These  measures  have  encom¬ 
passed  such  disparate  areas  as  the  compilation  of  the  maintenance  status  of 
mission-essential  equipment  to  the  tallying  of  chapel  attendance  by  unit 
personnel.  The  manifest  importance  of  unit  effectiveness  assessment  is 
evidenced  by  the  command  attention  paid  to  it  and  the  diversity  of  measures 
employed  in  this  assessment.  Despite  such  attention,  however,  there  has 
been  a  growing  body  of  criticism  regarding  the  accuracy  and  adequacy  of 
current  methods  of  monitoring  the  effectiveness  of  Army  units. 

Much  of  this  criticism  has  centered  around  reported  deficiencies  in 
systems  of  unit  readiness  reporting  (Barzily,  Catalogue,  and  Marlow,  1980;- 
Bowser,  1976;  Robinson,  1980;  Sorley,  1980;  U.S.  Army  War  College,  1976).  This 
degree  of  attention  is  appropriate  since  this  system  constitutes  the  major  means 
by  which  higher  echelons  monitor  the  effectiveness  of  Army  battalions  and 
separate  companies.  Even  though  the  data  from  this  reporting  system  provide 
major  input  to  the  development  of  Army  contingency  plans  and  guide  high  level 
resource  allocation  decisions,  the  consensus  of  opinion  of  those  who  have 
examined  this  system  is  that  it  is  seriously  faulted.  As  an  example,  In  the 
Army  War  College  study  (U.S.  Army  War  College,  1976),  questionnaires  measur¬ 
ing  perceptions  regarding  the  Unit  Status  Report  (USR)  were  administered  to 
approximately  2100  Army  personnel.  A  full  70  percent  of  this  sample  reported 
that  the  USR  does  not  reflect  the  true  readiness  condition  of  a  unit.  This 
opinion  was  likewise  voiced  in  the  course  of  interviews  with  over  1200  per¬ 
sonnel  conducted  as  another  component  of  this  same  study.  In  addition  to 
some  technical  problems  in  the  actual  computation  of  indices  contained  on 
the  USR,  this  study  found  two  major  factors  undermining  the  accuracy  and 
credibility  of  this  reporting  system.  The  first  factor  concerned  the  sub¬ 
stantial  degree  of  latitude  for  subjective  interpretation  of  unit  conditions 
chat  was  permitted  in  Mlling  out  the  Unit  Status  Report.  As  an  example, 
the  Training  Readiness  -'ndition  index,  which  constitutes  a  major  component 
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of  the  unit's  readiness,  is  based  totally  upon  the  unit  commander's  subjective 
estimate  of  the  number  of  weeks  of  training  the  unit  would  need  to  be  fully 
ready  for  combat.  It  was  the  conclusion  of  this  study  that  "The  training 
portion  of  the  USR  was  too  subjective  to  be  anything  more  than  a  wishful- 
thinking  guess.  The  training  REDCONs  being  reported  are  therefore  regarded 
as  both  inflated  and  invalid  by  a  sizable  majority  of  those  interviewed,  par¬ 
ticularly  at  company  level."  This  opinion  has  also  been  advanced  by  others 
who  have  examined  the  validity  of  current  unit  readiness  reporting  procedures 
(Robinson,  1SEC;  Ross,  Murphy,  March,  and  Robinson,  1979;  U.S.  Army  Concepts 
Analysis  Agency,  1975). 

The  second  major  problem  in  the  USR  uncovered  in  the  Army  War  College 
study  concerned  a  conviction  by  those  surveyed  and  interviewed  that  there  was 
pressure  on  unit  commanders  to  portray  the  unit's  capabilities  in  the  best 
possible  light  even  to  the  extent  of  masking  genuine  unit  deficiencies.  This 
problem  area  is  one  which  bodes  ill  not  only  for  the  validity  of  the  Unit 
Status  Report  itself,  but  also  for  the  validity  of  any  systematized  quantifi¬ 
cation  of  unit  readiness  indices.  A  key  conclusion  of  this  study  therefore 
was  that  the  current  Unit  Status  Report  reflected  Army  units  not  as  they 
actually  were  but  rather  the  units  as  all  would  wish  them  to  be. 


Since  the  publication  of  that  study,  efforts  have  been  made  to  revise 
and  improve  unit  readiness  reporting  by  increasing  the  reliance  on  objective 
measurements  of  unit  conditions  rather  than  on  the  subjective  interpretation 
of  the  unit's  capabilities.  As  of  1980,  however,  Sorley  held  that  the  Unit 
Status  Reporting  system  continued  to  suffer  major  deficiencies.  The  most 
central  of  these  deficiencies  continued  to  be  the  need  to  separate  the  pro¬ 
cess  of  evaluating  and  monitoring  unit  effectiveness  from  the  process  by 
which  the  performance  of  Army  officers  is  evaluated.  Sorley  believes  that 
only  by  removing  the  responsibility  for  unit  readiness  reporting  from  the 
chain  of  command,  which  likewise  evaluates  the  performance  of  the  individuals 
who  provide  unit  effectiveness  data,  can  the  real  or  perceived  pressure  to 
inflate  estimates  of  unit  effectiveness  be  removed. 

Sorley  further  holds  that  the  Unit  Status  Report  has  excluded  variables 
which  are  essential  tc  combat  readiness  and  therefore  the  USR  can  only  par¬ 
tially  reflect  the  total  capability  of  the  unit.  Factors  such  as  unit  cohe¬ 
sion  and  the  turnover  and  competence  of  key  unit  personnel  are  those  which 
he  feels  are  important  contributors  to  total  unit  capability  but  which  are 
not  now  employed  in  estimating  the  unit's  effectiveness.  Sorley  likewise 
suggests  that  the  information  contained  on  the  USR  be  complemented  with  the 
professional  judgment  of  individuals  familiar  with  the  unit.  This,  of  course, 
would  only  be  feasible  where  this  Judgment  could  be  rendered  frankly  and  openly. 

The  evaluation  of  unit  effectiveness  in  the  Army  is  not  restricted  to 
the  USR.  The  Army  has  had  a  long  tradition  of  monitoring  an  extensive  series 
of  variables  which  purportedly  reflect  the  state  of  morale  and  discipline  in 
the  unit.  Known  collectively  as  "command  indicators"  or  "traditional  indi¬ 
cators,"  this  set  of  unit  measures  typically  includes  such  variables  as 
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reenlistment  rates,  crime  rr.tes,  and  indices  associated  with  the  administration 
of  military  justice.  Unlike  the  USR  measures,  these  indices  are  not  systemat¬ 
ically  reported  at  the  unit  level  to  the  higher  echelons  of  the  Army  command 
structure.  However,  unit  measures  on  these  variables  are  used  frequently  at 
the  local  level  as  indicants  of  unit  conditions  and  problems.  Sorley  (1979) 
has  been  critical  of  the  use  of  such  measures  inasmuch  as  he  sees  them  leading 
to  a  "management  by  statistics"  in  which  those  factors  which  are  more  readily 
quantifiable  are  given  greater  command  emphasis  than  those  which  are  more  dif¬ 
ficult  to  quantify,  but  which  more  substantively  support  and  reflect  unit 
effectiveness.  Too  often,  he  feels,  command  attention  is  expended  on  "getting 
the  numbers  right"  in  such  areas  of  questionable  military  value  as  motor  vehicle 
accident  rates  or  the  number  of  letters  of  indebtedness  among  unit  personnel.  This 
occurs  at  the  expense  of  diverting  consnand  attention  from  such  areas  as  unit  train¬ 
ing  and  equipment  maintenance,  which  are  more  directly  supportive  of  the  unit’s  i 
mission.  The  position  underlying  his  assertions  is  that  statistical  indices 
of  unit  operations,  particularly  those  relevant  to  the  personnel  area,  are  of 
questionable  utility  in  assessing  areas  pertinent  to  unit  effectiveness.  This 
is  somewhat  inconsistent  with  his  position  (Sorley,  1980)  that  the  USR  be  sup¬ 
plemented  with  measures  in  such  areas  as  drug  abuse,  race  relations,  and  the 
alienation  and  commitment  of  the  unit  personnel.  Clearly,  some  statistical 
indices  are  more  germane  to  unit  effectiveness  assessment  than  others.  The 
question  remains  unresolved  as  to  the  identity  of  these  measures.  The  prolif¬ 
eration  of  statistical  indices  used  to  monitor  unit  functioning  has  been  fed 
by  the  variation  in  opinion  as  to  which  of  the  wide  variety  of  possible  mea¬ 
sures  are  the  most  accurate  indicants  of  unit  capability.  This  proliferation 
has  in  turn  led  to  many  of  the  abuses  and  problems  which  have  been  identified 
in  the  literature.  ; 

The  purpose  of  this  research  is  to  contribute  to  the  resolution  of  these 
problems  by  examining  the  value  of  the  most  typically  employed  statistical 
indices  in  reflecting  unit  effectiveness.  To  date  there  has  been  no  system¬ 
atic  examination  across  the  broad  spectrum  of  unit  effectiveness  measures 
which  would  permit  a  determination  of  the  relative  value  of  these  measures. 

The  absence  of  such  information  leaves  unchallenged  the  possible  reliance  on 
inaccurate  or  incomplete  asse:  iment  of  unit  effectiveness  and  thus  the  devel¬ 
opment  of  priorities  based  or.  apparent  rather  than  real  problems. 


METHOD 

Forty-eight  battalion  commanders,  twenty-eight  brigade  commanders,  and 
eight  general  officers  located  at  six  CONUS  installations  were  interviewed 
on  the  topic  of  battalion  effectiveness.  During  the  approximately  one-hour- 
long  interviews,  each  subject  was  asked  to  discuss  the  most  pressing  manage¬ 
ment  problems  confronting  him  in  maintaining  readiness,  to  operationally 
define  battalion  effectiveness,  and  to  evaluate  the  performance  of  his  sub¬ 
ordinate  battalions.  Each  subject  was  also  asked  to  assess  various  given 
measures  of  battalion  effectiveness.  These  measures  can  be  classified 
into  three  groups:  Readiness  Measures,  Command  Indicators,  and  Personal 
Judgments. 


The  first  of  these  groups.  Readiness  Measures,  is  a  relatively  direct 
assessment  of  a  unit's  capability  to  perform  its  mission.  This  group  of 
measures  includes  the  RE DC ON  ratings  from  the  USR,  the  percentage  of  unit 
equipment  that  is  operational  (as  gleaned  from  USR  data)  as  well  as  ARTEP 
and  AG I  results.  A  listing  and  definition  of  each  of  these  measures  can  be 
found  in  Figure  1. 

Command  Indicators  are  those  measures  which  are  traditionally  held  to 
reflect  a  unit's  state  of  morale  and  discipline.  The  specific  Command  Indi¬ 
cators  used  in  this  study  are  listed  and  defined  in  Figure  2. 

Personal  Judgments  are  those  opinions  of  battalion  effectiveness  held  by 
individuals  at  various  echelons.  The  ascending  hierarchy  of  authority  and 
the  six  specific  levels  used  in  this  study  are  as  follows:  (1)  Service  mem¬ 
bers  in  the  battalion  (SM);  (2)  Noncommissioned  officers  in  the  battalion 
(NCOs);  (3)  Company  grade  officers  in  the  battalion;  (4)  Brigade  Commanders; 
(5)  Assistant  Division  Commanders;  and  (6)  Division  Commanders. 

Subjects  were  asked  to  indicate  "how  accurate  an  assessment  of  battalion 
effectiveness  would  be  if  it  were  based  on  any  single  piece  of  information 
from  the  list  provided."  A  measure  providing  complete  accuracy  would  be 
rated  1002,  while  a  measure  providing  no  information  on  unit  effectiveness 
would  be  rated  02. 

Subjects  were  further  asked  to  choose  from  the  given  list  of  measures 
the  five  which,  in  combination,  would  provide  "the  most  complete  picture 
of  a  battalion's  overall  effectiveness." 


RESULTS 

Analyses  of  these  data  began  with  an  examination  of  the  degree  to  which 
there  was  a  difference  in  the  perceived  validity  of  the  effectiveness  measures 
across  positions  (i.e.,  battalion  commander  vs.  brigade  commander  vs.  general 
officer) .  A  three-level  one-way  ANOVA  was  therefore  performed  on  the  validity 
ratings  given  to  each  of  the  unit  effectiveness  measures.  Of  the  twenty-two 
measures  tested,  on  only  one  was  there  a  significant  position  difference  (Drug 
Arrest  Rate) .  It  was  concluded  that  there  existed  no  consistent  position  dif¬ 
ferences  in  the  perceived  validity  of  the  unit  effectiveness  measures,  since 
such  a  proportion  of  significant  results  is  essentially  what  would  be  expected 
from  chance  alone.  Accordingly,  the  data  from  the  three  groups  were  combined  in 
all  further  analyses. 

The  mean  accuracy  ratings  assigned  to  each  measure  are  rank  ordered  and 
presented  in  Table  1.  As  shown,  a  wide  range  of  mean  ratings  was  obtained, 
varying  from  72.52  accuracy  attributed  to  ARTEP  results  to  an  accuracy  rating 
of  only  29.22  for  desertion  rates.  In  general,  the  Readiness  Measures  and  the 
Personal  Judgments  were  given  the  highest  validity  as  measures  which  individ¬ 
ually  render  an  accurate  assessment  of  a  battalion's  effectiveness. 
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Table  2  presents  a  rank  ordering  of  the  frequency  with  which  each 
measure  was  included  in  the  group  of  five  providing  the  most  complete 
picture  of  a  battalion's  effectiveness.  The  sharp  drop  in  the  frequency 
of  selection  aft >r  the  fourth  measure  indicates  that  these  first  four 
are  important  measures  in  providing  an  overall  effectiveness  assessment. 
Measures  of  readiness,  specifically  the  ARTEP  and  AGI ,  and  the  personal 
judgments  of  those  in  the  unit,  specifically  the  company  grade  officers 
and  NCOs,  were  seen  to  provide  the  most  information  about  battalion  effec¬ 
tiveness.  Further,  the  top  four  measures  in  Table  2  are  the  same  as  the 
top  four  in  Table  1,  implying  that  these  measures  are  the  most  valid, 
whether  they  are  considered  individually  or  in  combination.  For  the  most 
part,  the  group  of  command  indicators  are  located  at  the  bottom  of  the 
continuum  on  both  Tables  1  and  2. 

As  seen  in  Table  1,  all  measures  within  each  group  (Command  Indicators, 
Personal  Judgments,  Readiness  Measures)  tended  to  be  assigned  similar  accuracy 
ratings.  Thus,  there  appears  to  have  been  a  consistent  rating  applied  to  all 
measures  within  each  of  the  three  groups  of  measures.  That  is,  while 
the  judgments  of  unit  NCOs  may  have  been  accorded  higher  validity  by 
subjects  than  the  judgment  of  division  commanders,  the  fact  that  both 
measures  entail  personal  judgments  tended  to  produce  very  similar  validity 
ratings  for  both  measures.  To  test  this,  coefficient  alphas  were  computed 
for  each  of  the  three  groups  to  determine  the  internal  consistency  of 
this  grouping.  These  reliability  measures  are  presented  in  Table  3A. 

All  three  coefficient  alphas  are  above  .89,  Tevealing  a  high  degree  of 
internal  consistency  within  each  group  of  effectiveness  measures. 

In  order  to  detect  whether  there  was  a  consistent  style  of  rating 
(i.e.,  preference  of  one  group  of  measures  to  the  exclusion  of  others), 
the  relationships  among  the  three  groups  of  measures  was  examined.  A 
mean  rating  was  computed  for  each  subject  for  each  of  three  groups  of 
measures.  Pearson  correlations  were  in  turn  computed  among  these  mean 
ratings.  These  intergroup  correlations  are  presented  in  Table  4A.  A 
substantial  positive  correlation  exists  among  the  three  groups,  suggesting 
that  even  in  the  presence  of  a  wide  variation  of  mean  ratings  (as  shown 
in  Table  1),  there  was  a  tendancy  for  subjects  to  display  a  rater  bias 
reflecting  a  global  impression  of  the  validity  of  any  formal  effectiveness 
measure.  Thus,  a  subject  who  gave  high  ratings  to  one  group  of  measures 
likewise  gave  high  ratings  to  the  other  two  groups.  As  an  extreme  example, 
one  subject's  accuracy  ratings  of  the  individual  measures  ranged  from 
90-100,  while  another's  Tanged  from  0-16. 

Thus,  to  correct  for  this  rater  bias,  a  set  of  corrected  ratings  was 
computed  for  each  subject  on  each  measure.  This  was  accomplished  by  first 
computing  the  average  rating  on  all  measures  given  by  each  subject.  Each 
subject's  average  Tating  was  then  subtracted  from  his  original  rating  cn 
each  measure  to  establish  a  set  of  corrected  ratings  for  each  subject. 
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The  rank  ordering  for  these  corrected  ratings  agrees  totally  with  the  ra.ik 
ordering  oi  the  uncorrected  ratings  (Table  1),  indicating  that  these  earlier 
results  were  not  artifacts  of  subjects'  biases  toward  effectiveness  measures 
in  general.  Thus,  the  ordering  of  the  ratings  for  those  with  high  confidence 
in  unit  effectiveness  measures  is  the  same  as  that  for  those  with  low  confidence 
in  unit  effectiveness  measures.  The  existence  of  this  rater  bias,  however,  does 
raise  the  possibility  that  the  high  internal  consistencies  in  each  of  three 
groups  of  measures  was  not  due  to  these  existing  as  natural  groupings  in  the 
minds  of  the  raters,  but  rather  was  an  artifact  of  the  rater  bias.  It  is  pos- 
sible  that  the  rater  bias  produced  high  intercorrelations  among  the  ratings 
given  to  all  of  the  unit  performance  measures.  This  would  in  turn  produce 
high  coefficient  alphas  for  the  ratings  given  to  the  measures  within  each  of 
the  groups.  This  interpretation  is  made  plausible  by  the  substantial  positive 
correlations  among  the  mean  ratings  given  to  each  group  of  measures  (Table  4A) . 
To  test  this  interpretation,  coefficient  alphas  were  recomputed  for  each  of  the 
three  groups,  based  on  the  corrected  ratings.  These  reliability  measures  are 
presented  in  Table  3B.  Though  the  alphas  drop  slightly  from  those  based  on 
the  uncorrected  ratings,  the  internal  consistency  of  the  groups  remains  accept¬ 
able,  indicating  that  while  these  coefficient  alphas  were  inflated  by  the  rater 
bias,  they  were  not  totally  attributable  to  them. 

Intercorrelations  among  the  three  groups  of  measures  were  next  computed, 
based  on  the  corrected  ratings.  These  intercorrelations  are  presented  in 
Table  4B,  showing  correlations  which  are  negative,  in  contrast  to  those  based 
on  uncorrected  ratings  (Table  4A) .  The  use  of  difference  scores  to  correct 
for  the  rater  bias  reveals  a  clearer  picture  of  Army  leaders'  preferences 
for  different  types  of  measures.  That  is,  a  definite  tendency  to  favor  one 
type  of  measure  over  the  other  two  is  now  displayed. 


DISCUSSION 


The  results  of  this  research  indicate  that  the  evaluation  of  the  existing 
Army  battalion  effectiveness  measures  has  two  facets.  First,  there  is  the 
attitude  military  leaders  have  towards  the  general  category  of  formal  battal¬ 
ion  effectiveness  measures.  There  was  seen  wide  variation  from  commander  to 
commander  in  this  attitude,  with  some  commanders  attributing  very  little  cred¬ 
ibility  to  any  of  the  battalion  effectiveness  measures.  Second,  there  appears 
to  exist  in  the  perceptions  of  senior  conmanders  a  distinct  typology  of  bat¬ 
talion  effectiveness  measures.  The  vide  spectrum  of  measures  studied  in  this 
research  broke  down  into  only  three  types,  or  groups,  of  measures  in  the  eyes 
of  the  interviewed  commanders — Readiness  Measures,  including  formal  evaluations 
of  unit  capability;  Personal  Judgments,  the  estimation  of  battalion  effective¬ 
ness  by  individuals  within  the  unit  and  above  the  battalion  in  the  chain  of 
command;  and  Command  Indicators,  which  include  traditional  measures  of  unit 
morale  and  discipline.  These  groups  were  seen  to  have  differing  degrees  of 
utility  as  indicants  of  battalion  effectiveness.  Readiness  Measures  and 
Personal  Judgments  were  seen  to  be  similar  in  value  whereas  Command  Indicators 
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were  seen  to  have  substantially  less  merit  as  indices  of  unit  effectiveness. 

These  findings  are  very  Jiuch  in  line  with  those  of  other  studies  both  in  mili¬ 
tary  (Bowser,  1976)  and  in  civilian  organizations  (M  ihoney  &  Weitzel,  1969; 
Weitzel,  Mahoney  &  Crandall,  1971)  which  show  that  managers  display  a  clear 
preference  for  operationally  oriented  measures  as  indices  of  organizational 
performance,  especially  in  contrast  to  measures  related  to  the  personnel  and 
human  relations  areas. 

From  these  results,  at  least  two  points  are  salient  in  their  operational 
implications:  the  low  value  ascribed  to  Coonand  Indicators  as  measures  of 

unit  effectiveness  and  the  high  value  ascTibed  to  Personal  Judgments. 

The  first  of  these  two  results  calls  into  question  the  practice  of  using 
Command  Indicators  as  measures  of  unit  effectiveness.  This  finding  is 
surprising  in  light  of  thv  long  history  of  utilization  which  such  measures 
have  had  in  the  military.  Despite  this  tradition,  the  interviewed  commanders 
saw  each  of  these  measures  as  being  ambiguous  in  their  implications  regarding 
unit  effectiveness  since  a  high  score  on  any  of  these  measures  could  stem 
from  a  multitude  of  possible  causes.  Thus,  while  each  of  these  measures  may  truly 
reflect  the  status  of  various  personnel  areas  in  the  unit,  this  information 
by  itself  is  held  to  be  too  ambiguous  to  Judge  the  effectiveness  of  the  command. 
The  use  of  such  measures  should  therefore  be  restricted  to  the  monitoring 
of  personnel  trends  and  Issues  at  levels  above  the  unit.  If  such  measures 
are  used  at  the  unit  level  to  assess  overall  unit  effectiveness,  the  present 
results  suggest  that  these  measures  should  be  used  only  in  combination 
with  othe’*  indicants  of  unit  operation.  In  using  Comnand  Indicators  to 
assess  the  morale  and  the  state  of  discipline  of  a  given  unit,  especially 
high  or  low  scores  on  these  measures  should  not  be  taken  at  face  value  but 
rather  should  precipitate  a  fuller  investigation  into  the  reasons  behind 
the  scores.  Only  with  this  fuller  base  of  information  can  sound  conclusions 
be  reached  regarding  the  state  of  the  unit. 

The  relatively  high  validity  ascribed  to  the  Personal  Judgment  measures 
supports  Sorley’s  contention  that  an  ideal  "Unit  Status  Reporting  system  would 
include  the  reporting  of  the  professional  judgment  of  battalion  effectiveness 
by  key  personnel.  In  light  of  the  already  well-documented  pressure  on  unit 
personnel  to  have  USR  data  appear  maximally  positive,  it  is  doubtful  that 
Judgments  rendered  within  the  context  of  the  present  USR  system  would 
substantially  contribute  to  a  fuller  assessment  of  unit  readiness  at  higher 
command  levels.  At  the  unit  level,  however,  the  high  ratings  given  to  NCO 
and  junior  officer  judgments  of  battalion  effectiveness  support  the  notion 
that  unit  commanders  can  best  assess  the  day-to-day  status  of  their  units 
by  relying  on  the  input  of  their  subordinates.  It  is  these  individuals 
who  have  the  most  detailed  and  direct  knowledge  of  unit  strengths  and 
weaknesses  inasmuch  as  it  is  they  who  directly  address  these  areas  in  the  course 
of  their  daily  duties.  For  such  a  rich  source  of  information  to  be  used  to  its 
fullest  potential,  however,  the  commander  must  have  the  skill  to  solicit  this 
information  in  a  frank  and  unbiased  form.  The  high  value  attributed  to  these 
estimates  of  unit  performance,  therefore,  underlines  the  importance  of 
leadership  development  for  the  Army.  Without  the  requisite  leadership 
skills  to  involve  his  or  hei  subordinate  leaders  in  the  development  of  the 
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total  unit  potential  and  to  solicit  and  evaluate  their  frank  judgment  of 
unit  capabilities,  unit  commanders  would  be  deprived  of  one  of  the  major 
instruments  of  monitoring  and  improving  the  operation  of  their  units. 

In  the  concept  of  prior  work  in  this  area  (Concepts  Analysis  Agency, 

1975;  Robinson,  1980;  Rosa,  et.  al. ,  1979;  Sorley,  1979;  US  Army  War  College, 
1976)  ,  the  present  results  call  for  a  careful  re-examination  of  the  structure 
and  process  currently  employed  to  evaluate  unit  effectiveness.  There  continues 
to  be  a  pressing  need  to  amend  and  improve  current  unit  effectiveness  assess¬ 
ment  systems  used  in  the  military.  The  process  is  ongoing;  it  needs  to  con¬ 
tinue  and  accelerate. 
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Figure  1 

READINESS  MEASURES 


OVERALL  READINESS 


A  battalion's  overall  readiness  status  as 
reported  in  the  monthly  Unit  Status  Report. 


PERSONNEL  READINESS 


A  battalion's  personnel  readiness  status  as 
reported  in  the  monthly  Unit  Status  Report. 


EQUIPMENT  ON  HAND  An  index  of  the  degree  to  which  a  battalion 

possesses  all  authorized  equipment,  a  reflection 
of  the  battalion's  supply  system. 


EQUIPMENT  SERVICEABILITY  The  maintenance  status  of  a  battalion's  equip¬ 

ment,  a  reflection  of  the  battalion's  mainte¬ 
nance  system. 


EQUIPMENT  ON  HAND  RATED  READY  The  proportion  of  equipment  a  battalion  actually 

has  on  hand  that  is  operational. 


ARTEF  The  percentage  of  the  mlssionB/tasks  rated 

"satisfactory"  during  a  battalion's  most  recent 
field  training  exercise. 


AGI  The  percentage  of  the  areas  rated  "satisfactory" 

during  a  battalion's  most  recent  annual  general 
inspection. 
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Figure  2 

COMMAND  INDICATORS 


ARTICLES  15 

COURTS  MARTIAL 

AWOL 

DESERTIONS 

FIRST  TERM  RE-UP 

CAREER  RE-UP 

CRIMES  OF  VIOLENCE 

PROPERTY  CRIMES 

DRUG  ARRESTS 


The  percentage  of  enlisted  personnel  administered  non- 
judicial  punishment  (e.g.,  fines,  reductions  in  grade) 
during  a  given  month. 

The  percentage  of  enlisted  personnel  receiving  a 
court  martial  during  a  given  month. 

The  percentage  of  enlisted  personnel  who  were  involved 
In  unexcused  absences  during  a  given  month. 

The  percentage  of  enlisted  personnel  who  deserted 
during  a  given  month. 


The  percentage  of  a  battalion's  first-term  reenlist- 
ment  objective  that  was  achieved  in  a  given  month. 

The  percentage  of  a  battalion's  reenlistment  objective 
for  career  personnel  that  was  achieved  in  a  given 
month. 

The  percentage  of  a  battalion's  enlisted  strength 
involved  in  crimes  of  violence  in  a  given  month. 

The  percentage  of  a  battalion's  enlisted  strength 
involved  in  crimes  against  property  In  a  given  month. 


The  percentage  of  a  battalion's  enlisted  strength 
arrested  for  drug  and  marijuana  violations  in  a 
given  month. 
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Table  1 


Accuracy  Assessments  of  The  Effectiveness  Measures 

Measure 

Mean  Accuracy 

ARTEP 

72.5 

Z 

Company  grade  officer's  judgment 

71.2 

Z 

NCO's  judgment 

71.0 

1 

AGI 

66.7 

1 

Equipment  on  hand  rated  ready 

63.3 

Z 

Brigade  commander's  judgment 

62.5 

z 

Service  member's  judgment 

62.1 

z 

Equipment  status 

59.9 

z 

Overall  readiness 

57.2 

z 

Assistant  Division  Commander's  judgment 

56.9 

z 

Division  Commander's  judgment 

53.7 

1 

First-term  reenlistment  rate 

51.3 

z 

Personnel  readiness 

50.7 

z 

Equipment  on  hand 

46.6 

z 

AWOL  rate 

45.9 

1 

Career  reenlistment  rate 

43.8 

z 

Crimes  against  property 

38.1 

z 

Article  15s 

37.4 

z 

Crimes  of  violence 

36.7 

z 

Courts-martial  rate 

32.5 

z 

Drug/marijuana  convictions 

30.8 

z 

Desertion  rate 

29.2 

z 
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Table  2 


Group  of  Five  Measures 

Providing  Most  Complete  Picture  of  Effectiveness 


Measure 

Number 

Choosing  Measure 

ARTEP 

63 

(75  2) 

NCO's  judgment 

54 

(642) 

AG1 

48 

(572) 

Company  grade  officer’s  judgment 

44 

(522) 

Brigade  commander's  judgment 

28 

(332) 

First-terra  reenlistment 

27 

(322) 

Service  member's  judgment 

25 

(292) 

Overall  readiness  rating 

25 

(292) 

Equipment  on  hand  rated  ready 

21 

(252) 

AWOL  rate 

15 

(182) 

Personnel  readiness 

13 

(152) 

Equipment  status 

12 

(142) 

Article  15s 

9 

(112) 

Career  reenlistment  rate 

7 

(082) 

Division  commander's  judgment 

7 

(082) 

Assistant  division  commander's  judgment 

5 

(062) 

Crimes  against  property 

4 

(052) 

Drug /marijuana  convictions 

3 

(042) 

Crimes  of  violence 

3 

(042) 

Equipment  on  hand 

3 

(042) 

Courts-martial  rate 

2 

(02%) 

Desertion  rate 

1 

(01 2) 
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Table  3 


Reliability  Coefficients  of  Effectiveness  Measure  Groups 


Uncorrected  Ratings 


Corrected  Ratings 


Readiness 

Measures 

Command 

Indicators 

Personal 

Judgments 


Table  4 


Effectiveness  Measure  Inter group  Correlations 


Uncorrected  Ratings 


Corrected  Ratings 


Readiness  Command  Personal 
Measures  Indicators  Judgments 


Readiness  Command  Personal 
Measures  Indicators  Judgments 


Readiness 

Measures 


1.000 


1.000 


Command 

Indicators 


.6537  1.000 


-.6641  1.000 


Personal 

Judgments 


.6297  .6802  1.000 


-.3391  -.4781  1.000 
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Kimtnel,  Melvin  J.  &  O'Mara,  Francis  E.,  US  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences,  Alexandria,  Virginia.  (Wed.  A.M.) 


The  Measurement  of  Morale 

\ 

"''  An  instrument  measuring  organisational  morale  was  constructed 
from  unit  member  satisfaction  response  aggregated  to  the  battalion 
level.  The  data  was  gathered  at  three  different  points  in  time  from 
military  personnel  within  55  CONUS  battalions.  Significant  positive 
correlations  between  the  satisfaction  scores  and  an  independent  index 
of  affective  orientation  supported  the  widely  held,  but  rarely  tested 
assumption  that  satisfaction  measures  are  a  true  indicant  of  an  indi¬ 
vidual's  affective  orientation  toward  his/her  unit.  Analysis  of  the 
Instrument's  psychometric  properties  showed  it  to  be  a  reliable  and 
valid  measure  of  morale  as  an  organizational  characteristic  as  distinct 
from  an  individual  level  variable.  Theoretical  and  applied  impli¬ 
cations  of  these  findings  for  the  study  of  organizational  morale  in 
military  and  nonmilitary  units  are  discussed. 
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THE  MEASUREMENT  OF  MORALE 


Melvin  J.  Kimmel  and  Francis  E.  O'Mara 
US  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 


While  there  are  obviously  many  factors  contributing  to  mission  accom¬ 
plishment,  one  that  has  been  consistently  emphasized  by  military  strategists 
is  the  unit's  morale.  In  a  recent  review  of  the  morale  literature,  Motowidlo 
et  al  (1976)  concluded  that  "apparently,  hardly  any  military  Commander  doubts 
that  morale  is  a  potent  force  determining  group  effectiveness."  (p.  52). 
However,  these  authors  also  point  out  that  despite  its  stated  importance, 
no  coherent  theories  of  organizational  morale  exist  and  there  Is  virtually 
no  systematic  empirical  literature  on  the  subject. 

An  Important  first  step  to  learning  about  morale  would  be  to  construct 
a  reliable  and  valid  measure  of  the  concept.  Motowidlo  and  Borman  (1977) 
were  only  partially  successful  in  developing  such  an  Instrument.  The  authors 
report  some  evidence  for  the  scale's  convergent  validity.  However,  its 
reliability  was  low  and  there  were  indications  of  halo  error  in  ratings. 

The  major  purpose  of  the  present  study  was  to  develop  an  instrument  free 
from  such  deficiencies. 

One  obvious  issue  to  consider  before  developing  a  valid  morale  measure 
is  its  definition.  Unfortunately  there  are  almost  as  many  definitions  of 
morale  as  there  are  people  writing  about  it  (Motowidlo  et  al,  1976).  While 
definitions  differ,  most  writers  (e.g.  Guion,  1958;  Martin,  1965;  Stagner, 
1958)  seem  to  agree  that  morale  represents  an  affective  orientation  toward 
the  work  unit  or  organization  and  includes  "job  satisfaction"  as  one  of  its 
major  components.  It  would  therefore  appear  appropriate  to  aggregate  mem¬ 
ber  responses  to  a  series  of  job  satisfaction  items  to  obtain  an  affective 
measure  of  the  unit's  morale. 

However,  some  organizational  psychologists  have  questioned  such  an 
approach.  Blum  and  Naylor  (1968)  and  Motowidlo  et  al  (1976),  for  example, 
argue  that  an  adequate  definition  of  morale  should  include  such  factors  as 
motivation  and  cohesion,  and  not  be  limited  to  job  satisfaction  alone. 
Further,  Guion  (1973)  and  Lincoln  and  Zeitz  (1980)  contend  that  while  it 
is  possible  to  aggregate  scorer,  on  an  individual  level  variable  to  form  an 
organizational  attribute,  it  makes  little  sense  to  do  so  with  an  affective 
characteristic  such  as  satisfaction.  They  explain  that  satisfaction,  like 
all  evaluative  or  affective  constructs,  is  subject  to  an  individual's  unique 
motives,  values  and  job  environment.  Since  these  characteristics  differ 
from  individual  to  individual,  tney  believe  it  would  be  pointless  to  aggre¬ 
gate  satisfaction  scores  in  an  attempt  to  form  a  relatively  stable  and 
generally  agreed  upon  affective  orientation  toward  the  organization.  This 
assumption  will  be  tested  as  part  of  our  attempt  to  develop  a  reliable  and 
valid  organizational  measure  of  morale. 


The  development  of  this  morale  measure  proceeded  in  two  phases.  The 
first  involved  the  comparison  of  satisfaction  scores  against  a  derived  index 
of  affect  to  assess  the  validity  of  using  individual  satisfaction  measures 
to  represent  a  member’s  affective  orientation  toward  the  organization.  The 
second  phase  was  directed  at  examining  the  psychometric  properties  of  a  unit 
morale  measure  that  is  based  on  aggregated  satisfaction  scores. 


Method 


Subjects.  Data  was  collected  at  three  different  points  in  time  from 
a  sample  of  55  combat  arms,  combat  support,  and  combat  service  support  bat¬ 
talions  located  at  six  CONUS  installations.  At  each  wave  of  data  collection 
an  independent  sample  of  service  members,  NCOs,  and  officers  within  each 
unit  was  randomly  drawn,  using  the  last  digit  of  individual  social  security 
numbers.  The  total  sample  for  each  wave  consisted  of  6,979  service  members, 
5,882  NCOs  and  6,172  officers. 

Procedure  and  measures.  Satisfaction  measures  were  administered  on 
three  separate  occasions  at  six-month  intervals  to  the  sample  of  unit  per¬ 
sonnel  as  part  of  a  larger  climate  survey  instrument.  Surveys  were  adminis¬ 
tered  in  large  groups  by  teams  of  researchers  using  standardized  instructions 
These  satisfaction  measures  were  drawn  from  the  Survey  of  Organizations 
(Taylor  and  Bowers,  1972)  and  measured  Individual  satisfaction  with  his  or 
her  unit,  supervisor,  coworkers,  and  job.  The  four  areas  of  unit,  supervisor 
coworkers  and  jobs  likewise  defined  the  four  major  content  domains  of  the 
overall  climate  survey.  Subjects  responded  to  each  of  these  items  utilizing 
a  five-point  scale  ranging  from  1  (’’Strongly  Disagree")  to  5  ("Strongly 
Agree") . 

A  morale  score  for  each  battalion  was  generated  by  first  averaging  the 
battalion  members'  responses  to  the  four  satisfaction  items  into  a  "General 
Satisfaction"  score  for  each  individual.  The  General  Satisfaction  scores 
for  all  battalion  members  were  then  averaged  to  derive  the  battalion  morale 
measure. 

The  independent  index  of  affect,  used  to  determine  If  satisfaction 
represents  a  member's  affective  orientation  toward  the  organization,  was 
constructed  by  first  converting  all  item  responses  on  the  climate  survey  to 
standard  scores.  All  non-satisfaction  items  were  next  categorized  as  being 
affectively  positive,  negative,  or  neutral  by  two  independent  judges.  The 
z- scores  for  all  positively  and  all  negatively  rated  items  were  then  averaged 
separately,  while  the  neutral  items  were  eliminated  from  further  analysis. 

The  two  resulting  statistics  were  labeled  z+  and  z_,  with  the  first  of  these 
being  an  indicant  of  a  subject's  tendency  to  agree  to  affectively  positive 
items  and  the  second  reflecting  agreement  to  affectively  negative  items. 
Highly  significant  (p<.001)  negative  correlations  were  observed  between 
z+  and  z  of  -.50,  -.49,  and  -.47,  for  waves  1,  2,  and  3  respectively,  sug¬ 
gesting  that  subjects  were  selectively  attending  to  the  affective  content 
of  the  items  and  were  responding  in  a  manner  consistent  with  their  general¬ 
ized  affect  towards  their  situations.  A  single  index  of  this  affective 
orientation  fz)  was  then  produced  using  the  equation:  z  =  (z+)  -  (z_)/2. 
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Results 


The  validity  of  satisfaction  measures  as  Indicants  of  affective  orien¬ 
tation  was  determined  by  Its  relationship  with  the  independent  measure  of 
affect,  z*.  Table  1  shows  the  correlations  between  ~  and  the  satisfaction 
measures  taken  individually  and  as  a  group  for  each  of  three  waves  of  data 
collection.  It  is  clear  that  there  is  substantial  correlation  between  the 
satisfaction  measures  and  the  independent  measure  of  affective  orientation, 
thus  validating  the  hypothesis. 

Given  a  high  degree  of  intercorrelation  among  the  four  satisfaction 
items,  responses  to  these  four  items  were  averaged  to  produce  a  single 
General  Satisfaction  score  for  each  individual.  This  single  measure  of 
General  Satisfaction  was  then  employed  in  examining  the  validity  of  affec¬ 
tive  orientation  as  an  organizational  attribute. 

Two  different  approaches  were  employed  in  assessing  this  validity. 

The  first  approach  examined  the  discriminant  validity  of  the  General 
Satisfaction  measure  at  the  battalion  level.  If  General  Satisfaction 
varied  only  at  the  individual  level,  it  would  be  randomly  distributed 
across  battalions  such  that  battalions  would  not  differ  on  this  variable. 
However,  if  affective  orientation  was  a  true  organizational  attribute, 
then  different  battalion  settings  would  produce  different  levels  of  the 
General  Satisfaction  variable.  Accordingly,  the  55  battalions  were  com¬ 
pared  on  General  Satisfaction  using  a  least-squares  one-way  ANOVA  on  this 
measure.  As  shown  in  Table  2,  battalions  differed  significantly  on  this 
measure  at  each  rank  level,  and  this  finding  was  consistent  across  the 
three  waves.  This  suggests  that  affective  orientation  is  a  true  organi¬ 
zational  attribute  and  can  thus  be  analyzed  at  this  level. 

The  second  approach  was  to  determine  the  stability  of  satisfaction  at 
the  organizational  level.  Battalion  morale  would  be  expected  to  vary  some¬ 
what  from  one  time  period  to  another  due  to  differences  in  environmental 
conditions  and  the  high  level  of  personnel  turnover  within  units.  However, 
if  morale  is,  in  fact,  a  true  organizational  variable,  some  consistency 
should  be  observed  across  time,  and  one  would  expect  positive  correlations 
in  battalion  morale  across  the  six  months  separating  the  data  collection 
waves.  To  test  this  hypothesis,  the  battalion  members'  General  Satisfaction 
scores  were  aggregated  at  each  wave  to  produce  a  mean  battalion  morale 
score.  Correlation  coefficients  were  then  computed  between  morale  scores 
on  the  adjacent  waves  for  each  rank  group  separately.  Table  3  presents 
the  results  of  this  analysis.  As  “an  be  seen,  the  correlation  coefficients 
between  the  Wave  1  and  Wave  2  morale  scores  were  significant  at  each  rank 
level,  offering  some  additional  support  for  the  hypothesis  that  morale  can 
be  conceptualized  as  an  organizational  variable.  However,  the  hypothesis 
was  not  totally  confirmed  as  only  service  members  showed  a  significant 
relationship  in  the  Wave  2/Wave  3  comparisons. 
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Discussion 


The  significant  correlations  found  between  the  satisfaction  items  and 
our  independent  measure  of  affect  support  the  commonly  held,  but  largely 
untested,  assumption  that  job  satisfaction  directly  reflects  an  individual ' s 
affective  orientation  toward  his/her  work  environment.  More  importantly, 
the  findings  derived  through  "he  analyses  of  the  morale  measures  take  affect 
out  of  the  realm  of  individual  psychology  and  suggest  that  an  affective 
variable  such  as  morale  can  be  legitimately  operationalized  at  the  organi¬ 
zational  level.  This  conclusion  is  tempered  somewhat  by  the  finding  that 
the  morale  measure  was  not  as  stable  for  NCOS  and  officers  as  it  was  for 
service  members.  One  possible  interpretation  is  that  a  different  set  of 
dynamics  operate  upon  morale  at  the  higher  levels.  Another  possibility 
relates  to  the  fact  that  each  aggregate  score  at  the  higher  rank  levels 
was  based  on  a  smaller  M  than  that  derived  for  service  members-  This 
being  the  case,  the  NCO  and  officer  morale  measures  would  not  possess  the 
same  degree  of  reliability  as  the  service  member  data,  and  may  underlie 
the  attenuated  stability  of  results  at  the  higher  levels.  Further  research 
fs"ne£ded  to  clarify  this  issue. 

The  general  conclusion  that  morale  is  an  organizational  variable  does 
not  contradict  organizational  psychologists  like  Guion  (1973)  who  note  the 
importance  of  distinguishing  between  attributes  of  people  and  attributes 
of  organizations.  _However,  in  support  of  Lincoln  and  Zeitz  (1980),  the 
results  clearly  demonstrate  that  it  is  possible  to  obtain  a  relatively 
stable  and  generally  agreed  upon  organizational  measure  by  aggregating 
individual- level  variable  scores.  While  supporting  the  general  proposition 
advanced  by  Lincoln  and  Zeitz,  the  results  refute  their  assertion  that 
affectively- laden  concepts  such  as  job  satisfaction  should  not  be  aggre¬ 
gated.  In  making  this  assertion,  Lincoln  and  Zeitz,  like  Guion  (1973), 
appear  to  be  inappropriately  equating  affective  orientation  and  job  satis¬ 
faction.  The  fact  that  satisfaction  is  an  individual-level  variable  in 
no  way  implies  that  all  forms  of  affect  must  be  conceptualized  at  this 
level.  Some  characteristics,  like  morale,  can  be  viewed  as  shared  attri¬ 
butes  of  group  members  and,  hence,  qualify  as  organizational  variables. 

We  suggest  the  proper  level  at  which  to  conceptualize  and  operationalize 
a  construct  should  be  empirically  determined  rather  than  decided  upon  on 
an  a  priori  basis. 

A  separate  question  relates  to  the  adequacy  of  a  morale  measure  that 
is  based  solely  on  satisfaction.  Although  most  writers  agree  that  satisfac¬ 
tion  is  an  important  component  of  morale,  some,  like  Blum  and  Saylor  (1968) 
and  Motowidio  and  Borman  (1977)  argue  that  other  dimensions  should  also  be 
included  to  capture  its  full  meaning.  We  suggest  that  while  morale  may,  in 
fact,  be  a  multidimensional  variable,  this  does  not  necessarily  imply  that 
a  unidimensional  measure  such  as  the  one  described  in  the  present  paper  is 
inappropriate.  Operational  definitions  of  psychological  constructs  rarely 
(if  ever)  tap  all  relevant  dimensions.  The  field  of  psychology  has  usually 
progressed  by  beginning  with  limited  measures  of  a  particular  concept  and 
subsequently  building  upon  these  first  approximations  (Eltas,  1975).  The 
same  procedure  is  suggested  in  the  case  of  organizational  morale.  Other 
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hypothesized  dimensions  should  be  incorporated  into  future  measurement 
instruments  and  their  discriminent  and  concurrent  validity  tested.  This 
systematic  approach  should  lead  to  the  development  of  a  truly  reliable  and 
valid  Instrument  that  does  justice  to  the  potential  multidimensional  nature 
of  the  concept.  Once  adequate  measures  are  developed,  it  will  then  be  pos¬ 
sible  for  researchers  to  effectively  study  the  antecedents  and  consequences 
of  the  morale  construct,  which  is  thought  to  play  such  an  important  role  in 
the  life  of  a  military  organization. 
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Table  1.  Zero-order  and  Multiple  Correlations  between  Affective  Response 
Bias  (z)  and  Satisfaction  Measures 


Wave  1 

Wave  2 

Wave  3 

r  R 

r  R 

c_ 

Satisfaction  with  Job 

.61 

.62 

.60 

Satisfaction  with  Unit 

.63 

.63 

.63 

Satisfaction  with  Supervisor 

.65 

.63 

.65 

Satisfaction  with  Coworlcers 

.44 

.46 

.44 

.81 

.79 

6  7f> 


Table  2.  Results  of  One-way  ANOVAs  Testing  Discriminability  of  General 
Satisfaction  Measure  by  Rank  and  Time 

Time 


Rank  Level 

1 

2 

3 

F  <df) 

L1M1 

Service  Members 

2.748  (53,3577) 

3.649  (52,3789) 

3.055  (53,4040) 

NCOS 

2.902  (52,1566) 

2.689  (52,1517) 

1.794  (53,1755) 

Offices 

1.908  (48,460) 

1.874  (52,597) 

3.076  (53,645) 

NOTE:  p<.001  for  all  F  valves. 
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Table  3.  Morale  Measure  Intercorrelations  across  Adjacent  Waves  by 
Rank  Level 


Interwave 

Coefficients 

Rank  Level 

r 

r 

Wave  1/Wave  2 

Wave  2/Wave 

Service  Members 

.3881** 

.4109** 

Noncommiss ic  >d  Officers 

.2463* 

.1941 

Officers 

.4945** 

.1755 

**  p  < .01 

*  p  <.05 
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<  DLIFLC  utilizes  multiple  sequential  screening  techniques 
for  first-term  enlistees  of  all  Services  who  apply  for  foreign 
language  training.  Candidates  must  qualify  on  the  AFQT,  attain 
specified  scores  on  selected  parts  of  the  ASVAB  and  attain  a 
specified  minimum  score  on  the  Defense  Language  Aptitude 
Battery.  Despite  these  measures,  academic  attrition  ranges 
from  12%  to  30%,  depending  upon  the  language  studied.  A  56-item, 
five-option,  attitude/interest  instrument  was  constructed  and 
administered  to  students  before  their  courses  started.  Item 
leads  centered  on  study  habits,  career  goals,  intercultural 
relationships  and  personal  attributes.  The  five  options  varied 
from  "strongly  agree"  to  "strongly  disagree."  A  scoring  program 
permitted  assignment  of  varying  weights  (4  to  0)  to  item  options. 
For  analysis,  students  were  divided  among  three  achievement  groups 
based  on  6th-week  grades.  This  paper  discusses  the  use  of  cross- 
tabulation  tables  (item  option  by  achievement  group)  for  item 
analysis  and  various  option  weighting  strategies  used  to  improve 
discrimination  among  achievement  groups. 
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The  purpose  of  this  research  was  to  develop  a  questionnaire 
which  would  complement  the  Defense  Language  Aptitude  Battery  in 
predicting  those  students  of  foreign  language  likely  to  encounter 
academic  difficulty  early  in  their  course  of  instruction.  .'he 
purpose  of  this  paper  is  to  describe  some  problems  encountered 
in  using  cross-tabulation  tables  to  weight  item  options  to 
maximize  predictive  validity  of  the  questionnaire. 

Introduction:  Enlisted  students  at  the  Defense  Language  Institute, 

Foreign  Language  Center  (DLI)  take  at  least  three  pencil-and-paper 
aptitude  tests  before  being  selected  for  attendance;  Armed  Forces 
Qualification  Test,  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB) ,  and  Defense  Language  Aptitude  Battery  (DLAB) .  These 
tests  and  the  nature  of  a  volunteer  military  tend  to  cause  the 
enlisted  student  body  to  be  relatively  homogeneous  with  respect 
to  age  and  educational  experience. 

Despite  selection  procedures,  academic  attrition  is  approxi¬ 
mately  14%  and  varies  with  the  difficulty  of  the  language  being 
learned.  Another  predictive  instrument  (hereafter  referred  to 
as  FLU)  was  deemed  desirable  to  enable  management  to  either 
(1)  reduce  attrition  by  applying  different  teaching  strategies 
to  those  students  whose  DLAB  and  FLI I  scores  indicated  probable 
difficulty,  or  (2)  to  release  early  in  the  course  those  students 
whose  DLAB,  FLII  and  course  scores  indicated  academic  difficulty 
and  probable  attrition. 

Management  requirements  for  the  FLII  were  that  its  develop¬ 
ment  consume  minimum  resources,  that  it  have  a  short  administrative 
time  (30  minutes  or  less)  and  that  it  be  machine  scoreable. 

Salient  psychometric  considerations  were  predictive  validity  and 
low  correlation  with  DLAB. 

Method:  In  order  to  meet  the  requirement  of  using  minimum  re¬ 

sources,  it  was  decided  to  use  items  from  past  research  projects. 
One  source  was  the  Foreign  Language  Interest  Inventory  developed 
at  DLI  in  1969.  Because  this  instrument  was  designed  to  be  ad¬ 
ministered  to  students  during  their  course  of  instruction,  many 
items  were  inappropriate  for  the  new  instrument  or  needed  revision. 
A  second  source  was  a  HumRRO  study  (Fiks  and  Brown,  1969)  which 
had  developed  some  items  that  correlated  with  academic  success 
in  learning  foreign  languages.  The  third  source  was  a  panel  of 
expert.'  who  had  a  combined  experience  level  in  foreign  language 
education  and  testing  of  40  years.  These  three  sources  provided 
56  items  for  the  pilot  form.  Each  five-option  item  described  a 
factor  (career  goal,  experience,  i ntorcul tural  relationships , 
personal  attribute,  etc.)  thought  to  be  related  to  learning  a 
foreign  language.  The  options  typically  took  the  form,  agree 
strongly ...  not  sure . . . d i soqree  stronjly,  and  the  order  of  presen¬ 
tation  was  reversed  on  some  items. 
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Items  from  the  early  Foreign  Language  Interest  Inventory  and 
the  HumRRO  instrument  were  originally  assigned  option  weights  of 
1  (for  one  option)  and  0  (for  the  remaining  options).  Traditional 
item  analysis  provided  insight  into  which  should  be  assigned  1. 

Assigning  option  weights  in  this  manner  seemed  undesirable 
for  items  purportedly  measuring  a  continuum.  A  great  deal  of 
information  may  be  lost  when  one  option  is  weighted  as  "correct" 
and  all  others  "incorrect".  Likewise,  for  statistical  analysis, 
a  great  deal  of  variability  may  be  lost  using  this  weighting 
scheme.  Therefor,  a  scoring  program  was  written  for  the  FLII 
which  permitted  assigning  a  weight  of  0  to  4  to  each  item  option. 

The  FLII  was  administered  to  279  students  before  they  started 
their  course  of  instruction.  These  students  were  randomly 
divided  into  two  groups;  a  derivation  sample  and  a  validation 
sample.  Due  to  administrative  attrition  and  incomplete  answer 
sheets,  N=130  and  124  for  the  derivation  and  validation  samples, 
respectively. 

Course  grade  at  the  end  of  six  weeks  was  used  as  the  criterion. 
Students  were  divided  into  three  academic  groups  according  to 
course  grade:  >90=Hi,  80-90  =  Med  and-C80=Lo.  Table  1.  summarizes 
information  on  sample  size  and  academic  group. 


TOTAL  SAMPLE  N=254 


DERIVATION  N=130 


ACADEMIC  GROUP 

HI 

MED 

LOW 

GRADES 

91-100 

eo-oo 

0-79 

N= 

53 

50 

27 

VALIDATION  N 

=  124 

ACADEMIC  GROUP 

HI 

MED 

LOW 

GRADES 

91-100 

80-90 

0-79 

Ni- 

53 

40 

31 

TABLE  1 . 

SUMMARY 

OF  SAMPLE 
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Item  analysis  was  accomplished  using  1  >:  5  contingency  tables 
(academic  group  by  item  option)  generated  for  each  item  by  Sub¬ 
program  Crosstabs  from  the  Statistical  Package  for  the  Social 
Sciences  (Nie,  et  al ,  1975).  Fig.  1.  shows  "ideal"  column  per¬ 
centages  for  an  imaginary  item  measuring  a  continuous  variable 
using  five  options.  In  this  ideal  example,  half  of  the  high  (Hi) 
academic  group  chose  option  A.  Other  options  were  chosen  by  fewer 
members  of  the  high  group.  The  students  doing  relatively  poorly 
in  the  course  (Lo)  most  often  chose  options  representing  the 
opposite  end  of  the  continuum.  The  middle  achievers  (Med)  most 
often  selected  options  representing  the  middle  of  the  continuum. 


ACADEMIC  GROUP 


LOW 

MED 

li  I 

t 

! 

A 

0 

5 

50 

B 

10 

20 

25 

o 

c 

1  5 

50 

15 

* — < 

C-4 

O 

D 

2  5 

20 

1  0 

E 

50 

5 

0 

Fig.  1  . 

Con*  i ngoncy 

Tn  r. 

"ideal " 

Result  s -Co  1 

n  Pi 
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Guttman  (1941)  showed  that,  for  a  quantitative  criterion, 
the  criterion  mean  of  the  examinees  who  chose  that  option  could 
be  used  as  the  option  weight.  However,  it  was  decided  to  use 
Subprogram  Crosstabs  because  it  was  readily  available  and  the 
output  was  in  a  form  useful  to  other  research. 

Results:  An  a  priori  weight  table,  with  options  scored  0,  1, 

2,  3,  4  or  4,  3,  2,  1,  0  depending  upon  how  the  options  were 
presented,  was  used  so  that  a  high  FLII  score  would  be  indicative 
of  high  academic  grades.  Using  this  straightforward  weighting 
scheme,  the  FLII  score  correlated  (Pearson)  .2966  with  sixth- 
week  grades.  In  light  of  Cronbach's  (1970)  assertion  that 
"Correlations  of  interests  with  grades  in  related  fields  are 
generally  below  0.30...",  the  magnitude  of  this  correlation 
was  encouraging. 

Contingency  tables  for  each  item  were  analyzed.  The  analysis 
indicated  that  "Lo  students"  and  "Hi  students"  displayed  rather 
similar  option  selection  patterns.  Unlike  the  pattern  in  Fig.  1., 
differences  between  Hi  and  Lo  groups  were  a  matter  of  degree  rather 
than  direction.  Fig.  2.  shows  the  contingency  table  (column  per¬ 
centages  only)  for  item  37.  Whereas  84%  of  the  Lo  group  and  78% 
of  the  Hi  group  picked  options  A  and  B,  3  of  4  "Hi  students"  chose 
B  and  only  1  of  2  "Lo  students"  picked  B. 


LOW 

I 

MED 

| 

HI 

ORIGINAL 
_  WEIGHT 

REVISED 

WEIGHT 

A 

3  3  ! 

| 

28  ! 

6 

4 

0 

B 

51 

56  ! 

73 

i 

1  3 

4 

—  i 

i 

■i 

C 

11  | 

16  | 
_ j _ 

20 

2 

3 

‘ 

i 

- 

D 

1 

0 

0 

1 

J 

0 

E 

„  ! 

i 

k_. . . 

I 

°  j 

0 

0 

i 

lo.  Column 

0 

Fir,.  2. 

It on  17 

Cont  i  noonc* 

•  Tab 

Percentages 

On  ]  y . 

Or i q i nq I 

and  Revised 

Wr  i  f ; 

his  Shown  at  Right. 
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A  similar  pattern  of  responses  was  found  for  many  items,  and 
option  weights  were  revised  following  these  rules: 

1)  A  high  weight  (usually  4)  was  assumed  to  that  option  chosen 
most  often  by  the  Hi  group. 

2)  Lower  weights  were  assigned  to  options  when  more  Hi  than  Lo 
selected  the  option. 

3)  Zero  was  assigned  to  those  options  when  more  Lo  than  Hi  selecte 
it . 


Note  that  no  items  were  discarded  at  this  time.  Fig.  2.  shows 
original  and  revised  weights  for  item  37  based  on  this  weighting 
strategy.  FLII  scores  now  correlated  (Pearson)  .09  with  sixth-week 
grades . 

Using  five  weights  was  notably  unsuccessful.  To  reduce  complex 
and  because  responses  of  Hi  and  Lo  groups  appeared  similar  to  warraj 
five  weights,  a  simplified  weighting  strategy  was  used: 

1)  A  weight  of  2  was  assigned  to  options  where  the  percentage  of  H; 
selecting  an  option  exceeded  the  percentage  of  Lo  by  20  or  more. 

2)  A  weight  of  1  was  assigned  to  options  where  the  percentage 
difference  was  10-19. 

3)  Zero  was  assigned  to  all  other  options. 

The  correlation  between  FLII  score  (using  the  new  weighting)  an< 
sixth-week  grades  was  .13.  It  rose  to  .24  when  the  Med  group  was 
excluded  from  calculation. 

Since  the  Med  group  appeared  to  adversely  affect  correlation 
and  had  been  ignored  in  weighting  strategies,  it  was  decided  to 
include  them  in  the  weighting  strategy.  Fig.  1.,  depicting  an 
"ideal"  response  pattern,  shows  that  a  pattern  exists,  not  only 
within  columns,  but  across  rows. 

In  order  to  capitalize  on  the  row  and  column  pattern  a  new 
weight  strategy  was  used: 

1)  Options  A  and  F  were  assigned  4  or  0  if  they  closely  matched 
the  "ideal"  row  and  column  pattern.  They  were  assigned  3  or  1  if 
they  matched  either  row  or  column  pattern. 

2)  Options  D  and  D  were  assigned  3  or  1  if  they  closely  matched 
the  ideal  row  and  column  patterns  oi  2  if  they  matched  only  one 
pattern . 

3)  Option  C  weight  was  assigned  depender*  upon  the  pattern  and 
weights  of  other  options. 
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With  this  weight  strategy,  7  items  were  discarded  because  of 
lack  of  any  pattern.  FLIT  score  correlated  .15  with  sixth-week 
grades . 

Thus  far,  all  weighting  strategies  were  inferior  to  the 
original,  a  priori  weight  table.  It  was  decided  to  revert  to 
the  original  weights,  but  refine  them  by  assigning  zero  weight 
to  those  options  chosen  by  10%  or  less  of  the  Hi  group  (See 
Fig.  3.) .  Seven  items  were  deleted  because  response  patterns 


showed  no 

distinction 

between  Hi  and 

Lo  groups. 

LOW 

MED 

HI 

ORIGINAL 

WEIGHT 

]  4 

REVISED 

WEIGHT 

A 

;  74 

i 

1 

77 

83 

4 

B 

i 

CD 

18  1 

13 

:  3 

3 

C 

'  7 

4 

3 

2 

i 

0 

D 

0 

0 

0 

1 

0 

E 

0 

0 

0 

i 

0 

0 

L. 

- 

Fig 

.  3 .  I tem  2 

4  Contingency  Table,  Column 

Percentages 

Only.  Original  and  Revised  Weights  Shown  at  Right. 


FLII  scores  now  correlated  .39  with  sixth-week  grades  and 
.15  with  DLAB  scores.  The  correlation  with  grades  regressed 
to  .22  when  the  weight  table  was  applied  to  the  validation  sample. 


Conclusion:  Gage  (1957)  found  that  partially  a  priori  scores 

for  five-option  items  were  at  least  as  valid  for  prediction  as 
an  elaborate  weighting  method.  This  research  is  partially 
supportive  of  those  findings.  While  some  improvement  in  pre¬ 
dictive  validity  was  possible  using  contingency  tables  for  item 
analysis  and  variable  option  weighting,  the  method  was  trial-and- 
error  and  more  time  consuming  than  expected.  The  Guttman  method 
is  more  rigorous  and  writina  a  computer  program  for  its  use 
does  not  appear  difficult.  DLI  is  presently  writing  such  a 
piogxam,  and  the  results  will  permit  a  comparison  between 
Guttman' s  method  and  the  contingency  table  method  used  in  this 
research . 
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INTRODUCTION, 


1.1  Statement  of  the  Problem 


The  Skill  Qualification  Test  (SQm  is  a  primary  means  the  Army  uses  to  assess  soldiers' 
proficiency  in  performing  critical  tasks  of  their  military  occupational/ specialties 
(MOSs).  It  contains  performance-oriented  and  criterion-referenced  tests  on  critical 
tasks  selected  from  the  soldier's  manual  for  6  duty  position.  Test  results  feedback 
is  provided  to  soldiers,  supe/visors , 'train-^hg  managers,  and  training/test  developers. 
The  SQT  usually  contains  a  w/itten  Ski  1\ Component  (SC),  a  Hands-On  Component  (HOC) , 


and  a  Job  Site  Component  ( J?C) , 


Although  the  SQT  program  s$4s  designed  to  maximize  hands-on  performance  testing, 
problems  of  standardization,  inter-rater  scoring  reliability,  extensive  tipe  and 
resource  requirements,  training  of  test  administrators,  and  overall  administrative  / 
feasibility  have  attenuated  its  success.  Much  skill  qualification  testing  has  become/ 
primarily  written  in  nature,  via  the  SC  or  a  written  Alternate  HOC.  Th^  written  j 
tests  allow  for  standardized  administration  and  scoring,  but  often  have  low  fidelity, 
to  actual  job  performance  requirements.  The  intended  objective  of  performance-based, 
criterion- referenced  SQT  testing  could  be  more  effectively  realized  by  a)  increased' 
fidelity  of  the  SQT'  (that  is,  more  similarity)  to  job  requirements  and  task  per¬ 
formance;  b)  reduced  reliance  on  written  components;  c)  reduced  time  between  test  , 
administration  at*)  test  results;  and  d)  increased  feasibility  of  unit  level  SQT 
administration. 


1.2  Concept  of  Computer-Based  Embedded  Testing 

The  increasingly  extensive,  use  of  the  computer  in  military  systems  offers  an  oppor¬ 
tunity  to  alleviate  present  shortcomings  cannon  to -conventional  SQTs  for  tactical 
data  and  weapons  systems.  The  concept  is  computer-based  embedded  SOT  testing' in 
which  the  computer  of  a  tactical  weapons  system  monitors,  controls,  and  scores  a 
test  on  actual  job  tasks  administered  on  that  system  itself  (or  a  simulator)..  This 
paper  describes  the  development  and^validation  of  a  computer-based  SQT  Hand^-On  „ 
Component  (HOC)  for  the  TACFIRE  Artillery  Control  Console  Operator  (ACCO),  M0$  13C. 


ThO  views  expressed  in  this  paper  are  those  pf  the  authors  and  do  not  necessarily 
ipply  endorsement  of  the  Department  of  the  A r%  or  the  Army  Research  Institute. 

^Work  described  in  this  paper  was  performed  by  Honeywell  Inc.  for  the  U.S.  Army 
■  Research  Institute  for  the  Behavioral  and  Social  Sciences,  contract  no. 
MDA9u3-7d-C-0386 . 
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An  alternative  to  present  SQTs  for  tactical  data  systems  is  the  concept  of  a 
computer-based  embedded  SQT  administered  on  the  operational  system  or  simulator  to 
accomplish  standardized,  objective,  hands-on  skill  qualification  assessment.  This 
paper  describes  a  computer-based  SQT  Hands-On  Component  developed  for  TACFIRE 
Artillery  Control  Console  Operators,  MOS  13C,  using  the  PLANIT  courseware  authoring 
language.  The  SQT  complies  with  TRADOC  policy  and  procedures  for  SQT  development 
and  validation  (TRADOC  Reg  351-2,  Pam  351-2).  An  expert  tryout  for  content  validity 
and  a  representative-soldier  tryout  for  feasibility  of  administration  were  conducted. 
A  methcd  for  diagnosing  and  correcting  logic  errors  in  the  courseware  (vs.  operators’ 
errors)  was  applied.  Benefits  of  computer-based  SQT  testing  include:  maximun 
fidelity  attained  by  utilizing  the  operational  system;  control,  monitoring,  and 
scoring  performed  by  the  tactical  system  computer;  centralized  or  field  unit  test¬ 
ing  capability;  inmediate,  detailed  performance  feedback  provided;  test  results 
compatible  in  format  with  TRADOC  guidelines. 
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COMPUTER-BASED  SKILL  QUALIFICATION  TESTING:  DEVELOPMENT  AND  VALIDATION1 

Christopher  G.  Koch 
Judith  A.  Englert 
Richard  E.  Vestewig 
Honeywell  Systems  and  Research  Center 
2600  Ridgway  Pkwy.  ,  P.0.  Box  312,  Minneapolis,  MN  55440 

John  T.  Larson 

U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 
5001  Eisenhower  Ave.,  Alexandria,  VA  22333 


1.0  INTRODUCTION 


1 . 1  Statement  of  the  Problem 

The  Skill  Qualification  Test  (SQT)  i«=  a  primary  means  the  Army  uses  to  assess  soldiers' 
proficiency  in  performing  critical  tasks  of  their  military  occupational  specialties 
(MOSs).  It  contains  performance-oriented  and  criterion-referenced  tests  on  critical 
tasks  selected  from  the  soldier's  manual  for  a  duty  position.  Test  results  feedback 
is  provided  tc  soldiers,  supervisors,  training  managers,  and  training/test  developers. 
The  SQT  usually  contains  a  written  Skill  Component  (SC),  a  Hands-On  Component  (HOC), 
and  a  Job  Site  Component  (JSC). 

Although  the  SQT  program  was  designed  to  maximize  hands-on  perTormance  testing, 
problems  of  standardization,  inter-rater  scoring  reliability,  extensive  time  and 
resource  requirements,  training  of  test  administrators,  and  overall  administrative 
feasibility  have  attenuated  its  success.  Much  skill  qualification  testing  has  become 
primarily  written  in  nature,  via  the  SC  or  a  written  Alternate  HOC.  The  written 
tests  allow  for  standardized  administration  and  scoring,  but  o.ten  have  low  fidelity 
to  actual  job  performance  requirements.  The  intended  objective  of  performance-based, 
criterion-referenced  SQT  testing  could  be  more  effectively  realized  by  a)  increased 
fidelity  of  the  SQT  (that  is,  more  similarity)  to  job  requirements  and  task  per¬ 
formance;  b)  reduced  reliance  on  written  components;  c)  reduced  time  between  test 
administration  and  test  results;  and  d)  increased  feasibility  of  unit  level  SQT 
administration. 

1 . 2  Concept  of  Computer-Based  Embedded  Testing 

Tne  increasingly  extensive  use  of  the  computer  in  military  systems  offers  an  oppor¬ 
tunity  to  alleviate  present  shortcomings  conrnon  to  conventional  SQTs  for  tactical 
data  and  weapons  systems.  Tne  concept  is  computer-based  embedded  SQT  testing  in 
which  the  computer  of  a  tactical  weapons  systew  monitors,  controls,  and  scores  a 
test  on  actual  job  tasks  administered  on  that  system  itseli  (or  a  simulator).  This 
paper  describes  the  development  and  validation  of  a  computer-based  SQT  Hands-On  0 
Component  ( HOC .•  for  the  IACFIRE  Artillery  Control  Console  Operator  (ACCO) ,  MOS  13C.1" 


1  ho  view?  expressed  in  this  paper  are  those  of  the  authors  and  do  not  necessarily 
imply  endorsement  of  the  Department  of  the  Army  or  the  Army  Resoarcri  Institute. 

deicribed  in  r.his  paper  was  perform!  J  by  Honeywell  inc.  for  the  U.S.  Army 
’_drc,i  institute  tor  the  behavioral  and  Social  Sciences,  contract  no. 

yr,*oe?-??-i.-0386. 


The  SQT  HOC  is  comprised  of  courseware  written  in  a  computer-assisted  instruction 
system  and  authoring  language  called  PLANIT.  It  is  administered  in  an  embedded 
mode  on  the  TACFIRE  tacticai  system  either  in  the  school  or  field  setting.  The  SQT 
complies  with  TRADOC  policy  and  procedures  for  SQT  development  and  validation 
(TRADOC  Reg  351-2,  Pam  351-2)  ana  offers  many  advantages  over  conventional  HOC  test¬ 
ing  for  the  MOS  13C  duty  position. 

1 .3  Background 

1.3.1  TACFIRE  System.  The  TACEIRE  system  selected  as  the  testbed  for  applying  the 
embedded  SQT  concept  is  a  computerized  TACtical  FIRE  direction  system  designed  to 
coordinate  the  command,  control,  and  communications  functions  within  and  among  all 
levels  of  the  field  artillery  system  from  corps  to  forward  observers.  TACFIRE  may 
be  conceptualized  as  an  interrelated  set  of  subscribers,  each  of  whom  has  access 
to  a  common  data  base  and  a  powerful  set  of  applications  software.  The  artillery 
control  console  (ACC)  shown  in  Figure  1  is  the  principal  man-machine  interface  in  the 
TACFIRE  system. 


!CfED)  SCREEN 


Figure  1.  TACFIRE  Artillery  Control  Console 


The  ACCO  is  a  grade  E6  or  E7  noncormi ss ioned  officer  responsible  for  maintaining  and 
updating  the  TACFIRE  data  base.  His  functions  include:  entering  data  required  for 
artillery  target  intelligence,  fire  planning,  tactical  and  technical  fire  control, 
and  commander's  criteria;  controlling  comr.uni cations  among  the  network  of  TACFIr 


subscribers;  and  Initiating  ,  processing,  and/or  terminating  artillery  fire 
missions.  Task  activities  for  which  the  ACCO  has  responsibility  include: 

o  Receiving  and  interpreting  information.  The  ACCO  receives  information  from 
the  subscriber  network  on  the  receive  display  (RD)  screen.  This  informa¬ 
tion  is  encoded  in  TACFIRE  specific  formats,  and  requires  that  the  ACCO  be 
skilled  in  interpreting  the  information  and  acting  quickly. 

o  Composing  and  sending  messages.  The  ACCO  enters  data  into  the  system  via 
special  message  formats  on  the  compose/edit  display  (C/ED)  screen,  then 
makes  proper  switch-actions  to  initiate  processing  of  the  data  and  trans¬ 
mission  to  designated  on-line  subscribers. 

1.3.2  PLANIT  System.  The  embedded  SQT  for  the  TACFIRE  system  was  written  in 
Programmi ng  LANguage  for  Interactive  Teaching  (PLANIT),  a  powerful  computer-assisted 
instruction  system  and  courseware  authoring  language  useful  for  preparing  and 
administering  training  and  evaluation  scenarios  via  computer  terminal.  Courseware 
is  constructed  either  interactively  or  off-line,  and  the  author  may  execute  and/or 
edit  a  lesson  at  any  time.  PLANIT  automatically  maintains  an  audit  trail  of  each 
student's  progress  through  a  course,  thus  relieving  the  instructor  or  scorer  of 
tedious  record-keeping  responsibilities. 

The  PLANIT  operating  system  is  made  machine  transportable  and  has  been  installed  on 
computer  systems  supporting  FORTRAN,  PASCAL,  and  TACPOL  (language  used  for  TACFIRE). 
For  the  TACFIRE  AN/GYK-12  application,  enhancements  to  the  operating  system  software 
were  implemented  to  extend  PLANIT’s  capabilities  to  a  specific  hardware  system. 
PLANIT's  enhanced  language  for  TACFIRE  permits  the  lesson  author  courseware  control 
of  switch  indicator  lights,  and  interprets  the  operation  of  ACC  function  keys. 

Outputs  may  be  directed  to  either  of  two  screens,  to  the  electronic  line  printer 
(ELP) ,  or  to  the  switch  panel  assembly  (SPA).  Inputs  are  processed  from  the  C/ED 
screen,  the  alphanumeric  keyboard,  or  the  SPA.  Under  control  of  PLANIT's  enhanced 
language  for  TACFIRE,  the  TACFIRE  ACC  Can  operate  exactly  as  it  would  under  control 
of  the  operational  system  software  in  tactical  mode. 

2.0  DEVELOPMENT  OF  COMPUTER-BASED  SQT 

2.1  Requi  remen ts 

The  development  of  the  HOC  offered  a  unique  challenge  for  complying  with  regula¬ 
tions  that  do  not  explicitly  address  issues  for  computer-based  SQT  testing.  Although 
some  steps  identified  in  TRADOC  Pam  351-2  were  not  relevant  (for  example,  training 
hands-on  test  scorers  and  assessing  inter-rater  reliability),  and  other  additional 
developmental  steps  were  included,  the  TRADOC  regulation  and  pamphlet  served  as 
adequate  guidance  for  the  development  process. 

2.2  Development  Process 

The  major  activities  conducted  during  development  of  the  SQT  were: 

1.  Select  tasks  for  testing. 

2.  Assign  tasks  to  SQT  components. 

3.  Develop  SQT  plan. 
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4.  Develop  test  materials. 

a.  Collect  task  performance  data. 

b.  Organize  hands-on  tests. 

c.  Program  courseware  for  hands-on  tests  on  commercial  computer. 

d.  Try  out  courseware  materials  on  TACFIRE  computer. 

e.  Revise  hands-on  tests. 

5.  Develop  support  documentation— "Manual  for  the  Administration  of  the 
Hands-On  Component"  (MAHOC) ,  SQT  Notice  hands-on  test  worksheets, 

"Data  Supplement,"  and  "Operator's  Guide  for  Preparing  SQT  Materials." 

6.  Conduct  expert  tryout. 

7.  Submit  initial  draft. 

8.  Conduct  soldier  tryout. 

Pertinent  characteristics  of  the  finalized  TACFIRE  embedded  SQT  HOC  are  suimarized 
in  Table  1.  Koch,  Pine,  and  Steinheiser  (1980)  and  Koch,  Englert,  and  Vestewig 
(1981)  provide  details  of  the  development  process  for  the  SQT. 

TABLE  1.  CHARACTERISTICS  AND  CAPABILIITES  OF  THE 
TACFIRE  EMBEDDED  SQT  HANDS-ON  COMPONENT 


Objective 

o  Assess  qualification  of  artillery  control  console  operator  (ACCO), 

MOS  13C.  skill  levels  3  and  4. 

Conditions  of  Testing 

o  Delivered  on  operational  tactical  TACFIRE  equipment  using  TACFIRE 
AN/GYK-12  computer  or  on  TACFIRE  Training  System  (TTS). 
o  Capability  of  simultaneous  testing  of  soldiers, 
o  High  fidelity  to  soldier's  manual  (FW  6-13C3,  FH  6-13C4)  tasks, 
o  Standardized  aininistration  at  either  centralized  (Field  Artillery 
School)  or  field  unit  installations. 

Test  Organization 

o  HOC  composed  of  hands-on  tests. 

o  Automatic  presentation  of  all  necessary  test  instructions  and  hands-on 
test  conditions,  standards ,  and  performance  measures. 

Test  Scoring 

o  Scoring  performed  automatically  by  TACFIRE  computer, 
o  Each  hands-on  test  scored  "GO"  or  "NO-GO"  based  on  nunber  of  per¬ 
formance  measures  passed  and  satisfaction  of  time  limit  standard, 
o  Qualification  on  SQT  based  on  criterion  percentage  of  hands-on  tests 
scored  "GO." 

Test  Results  Feedback 

"  o  Iriroediate  test  results  and  performance  feedback. 

o  Performance  feedback  specific  to  items  at  the  performance  measure  level, 
o  Individual  Soldier's  HOC  Report  providing  tabulation  of  results, 
o  HOC  Record  Sheet  provloing  official  record  of  test  results. 


3.0  VALIDATION  OF  SQT  HANDS-ON  COMPONENT 

3.1  Validation  Plan 

The  validation  phase  of  the  SQT  development  involved  the  production  of  a  validation 
plan  and  two  formal  tryouts  of  the  HOC  in  accordance  with  the  TRADOC  guidelines 


(TRADOC  Pam  351-2).  The  objective  of  the  validation  was  to  examine  the  HOC  for 
content  validity,  accuracy,  perceived  fairness,  and  administration  feasibility. 

The  strategy  for  validation  of  the  HOC  is  reflected  in  the  flow-chart  in  Figure  2. 

Two  unique  issues  were  important  in  assuring  an  acceptable  validation  of  the 
computer-based,  embedded  SQT.  The  first  concerned  a  necessary  distinction  between 
logic  errors  in  the  hands-on  test  courseware  and  unfair  or  inaccurate  test  content 
per  se.  Thus,  two  types  of  validation  were  required— one  for  courseware  materials 
and  one  for  hands-on  test  technical  content.  The  second  issue  concerned  the 
requirements  for  achnini strati on  (using  the  resources  of  a  typical  unit)  and 
maintainability  associated  with  a  computer-based  SQT. 


EXKR7  TRYOUT  SSLOtER  TRYOUT 


Figure  2.  Flow  Diagram  of  SQT  HOC  Validation  Process 
3.2  Expert  Tryout 

3.2.1  Objectives .  The  objectives  of  expert  tryout  of  the  TACF1RE  SQT  HOC  were  to: 

o  Provide  TACF1RE  subject  matter  experts  an  opportunity  to  systematically 
review  each  hands-on  test  for  accuracy  of  the  task  procedures  as  defined 
by  the  soldier's  manual. 

o  Examine  the  draft  MAHOC  for  content,  readability,  and  clarity, 

o  Identify  and  implement  any  necessary  revisions  to  the  hands-on  tests, 
instructions,  and  draft  MAHOC, 

o  Determine  test  administration  requirements,  including  time  to  set  up  and 
administer  the  hands-on  tests. 

3.2.2  Procedure.  The  participants  in  expert  tryout  were  14  instructors  in  the 
TACFIP.E  Advanced  Training  Program  at  the  U.S.  Aimy  Field  Artillery  School,  Fort 
Sill,  Oklahoma,  Both  noncormissioned  and  coimissioned  officers  were  included 

in  the  sample.  Five  experts  completed  the  SQT3  HOC  (the  HOC  for  skill  level  3), 
and  six  experts  completed  the  SQ f 4  HOC  (the  HOC  for  skill  level  4).  Fach  of  the 
experts  served  both  as  test  administrator  and  examinee.  A  two-hour  workshop  was 
conducted  to  familiarize  the  participants  with  procedures  for  preparing  the 
computer-based  HOC  to  be  administered. 
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During  expert  tryout,  the  participants  executed  the  support  procedures  routine  to 
prepare  the  HOC  to  be  administered  and  then  performed  each  of  the  hands-on  tests 
in  the  same  manner  as  an  examinee  would.  Data  of  various  types  were  collected  to 
meet  the  stated  objectives.  Table  2  sunmarizes  the  type,  purpose,  and  collection 
time  of  these  data. 


TABLE  2.  EXPERT  TRYOUT  DATA  COLLECTION 


type 

Purpose 

When  Collected 

Expert  Review  Form 

Examine  task  domain 
sampling,  hands-on  test 
accuracy,  test  instructions, 
and  test  fairness 

End  of  each  hands-on  tost 

Draft  MAHOC  Review 
Form 

Examine  Draft  MAHOC 

End  of  workshop  to  train 
test  ackiinistrators 

Support  Procedures 
Review  Fore 

Examine  test  administrator's 
support  procedures 

End  of  test  session 

General  Computer 
Instructions  Review 
Form 

Examine  general  HOC 
instructions  to  the  examinee 

End  of  test  session 

Error  Data 

Examine  hands-on  tests 

Recorded  automatically 
by  computer  during  test 
session 

3.2.3  Results.  Responses  to  review  forms  on  the  draft  MAHOC,  support  procedures, 
and  general  instructions  identified  areas  for  revision.  The  section  in  the  MAHOC 
dealing  with  computer  malfunction  recovery  was  considered  inadequate,  and  certain 
portions  of  the  general  instructions  were  rated  as  unclear.  Prerequisite  data  pro¬ 
vided  on  the  ELP  for  some  hands-on  tests  were  difficult  to  read  due  to  poor  printer 
type  element  quality  and  caused  some  undeserved  errors.  Responses  on  the  expert 
review  forms  dealing  with  technical  content  of  each  hands-on  test  identified  problems 
in  these  areas:  task  requirements  unclear;  Only  one  try  allowed  for  selecting 
message  format  switches;  optional  data  entries/blanks  not  allowed;  alternate  legal 
forms  of  data  entry  not  allowed;  answer  match  lines  containing  typographical  errors. 

Analysis  of  experts'  errors  on  the  hands-on  tests  yielded  three  identifiable 
causes  of  performance  measure  failures: 

o  Operator  error--incorrect  switch-action  or  failure  to  complete  hands-on 
test  within  appropriate  time  limit. 

o  Courseware  logic  error- -mi stakes  in  the  PLANIT  courseware  that  may  cause 
unwarranted  performance  measure  failures. 

o  Inappropriate  time  standard— insufficient  time  allowed  for  completing 
hands-on  tests. 

Distinguishing  operator  errors  from  logic  errors  required  an  examination  of  the 
printer  output  for  inconsistencies  between  the  expert's  response  and  his  detailed 
test  performance  feedback. 

The  remainder  of  the  error  analysis  focused  on  the  operator  errors.  Four  classifi¬ 
cations  of  operator  errors  and  the  average  frequency  with  which  they  were  committed 
are-  a)  incorrect  data  entry—52%;  b)  incorrect  message  format  selection— 23% ; 
c)  incorrect  switch  action—13%;  d)  excessive  time  to  complete  the  task— 12%. 
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3.2.4  Revisions.  Revisions  to  the  hands-on  test  materials  based  on  expert  tryout 
data  included:  expanding  the  computer  malfunction  recovery  section  of  the  MAHOC; 
developing  a  "Data  Supplement"  containing  hardcopy  of  prerequisite  data  for  all 
hands-on  tests;  rewriting  portions  of  general  instructions  in  simpler  language; 
clarifying  task  requirements;  implementing  two  tries  to  select  correct  message 
format  switches;  allowing  optional  data  entries/blanks :  allowing  alternate  legal 
forms  of  data  entries;  correcting  typographical  errors  in  answer  match  lines; 
correcting  courseware  logic  errors;  revising  inappropriate  time  limit  standards. 

After  modifications  to  the  SQT  HOC  had  been  implemented t  the  hands-on  tests  were 
again  ackninistered  to  a  subset  of  the  expert  tryout  participants  to  verify  accuracy 
of  the  revisions. 

3.3  Soldier  Tryout 

3.3.1  Objecti ves.  The  objectives  of  soldier  tryout  of  the  TACFIRE  SQT  HOC  were  to: 

o  Determine  the  HOC's  administrative  feasibility  in  a  realistic  test 
setting  with  representative  soldiers  in  the  MOS. 

o  Continue  to  examine  the  hands-on  tests  for  accuracy,  appropriateness, 
and  fairness. 

o  Perform  statistical  analyses  of  hands-on  test  performance  data. 

o  Continue  to  examine  the  draft  MAHOC  for  clarity  and  completeness. 

o  Implement  revisions  to  the  hands-on  tests,  instructions ,  and  draft 
MAHOC  where  necessary. 

3.3.2  Procedure.  The  soldier  tryout  participants  were  22  soldiers  in  MOS  13C 
stationed  at  Fort  Sill,  Oklahoma.  Ten  staff  sergeants  completed  SQT3  HOC  and  ten 
sergeants  first  class  completed  SQT4  HOC.  Personnel  from  the  Directorate  of 
Training  Development,  U.S.  Army  Field  Artillery  School  served  as  test  administrators. 

The  type,  purpose,  and  collection  time  of  soldier  tryout  data  are  shown  in  Table  3. 
The  data  derived  from  these  sources  were  analyzed  according  to  the  process  depicted 
in  Figure  3, 


TABLE  3.  SOLDIER  TPYOUT  DATA  COLLECTION 


Type 

Purpose 

When  Collect  so 

Soldier  Review  Font 

Examine  hands-on  tests  Tor 
fairness,  accuracy ,  arc 
clarity  of  instructions 

tno  of  test  session 

General  Computer 
Instructions 

Review  Form 

Examine  general  HOC 
instructions  to  the 
examinee 

End  of  test  session 

Error  Data 

Examine  hands-on  tests; 
perform  statistical 
evaluation  of  HOC 

Recorded  automatically  by 
computer  During  test 
session 

Timing  Data 

Determine  test 
administration  requirements; 
examine  performance 
standards  for  task 
completion  time 

Recorded  automatically  by 
computer  ouring  test 
session 
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Figure  3.  Flow  Diagram  of  Error  Analysis  and  Test 
Revision  Process  Following  Soldier  Tryout 

3.3.3  Results .  Responses  on  the  general  instructions  review  forms  identified  two 
portions  of  the  instructions  that  some  soldiers  found  urclear--action  to  take  after 
a  discovered  error,  and  amount  of  help  expected  from  the  test  site  manager. 
Responses  on  the  soldier  review  forms  evidenced  that  a  large  percentage  of  the 
soldiers  viewed  the  SQT  HOC  as  a  fair  and  acceptable  measure  of  their  ability  to 
perform  job  duties. 

Errors  conmitted  on  the  hands-on  tests  were  again  categorized  by  source— operator 
error,  courseware  logic  error,  and  inappropriate  time  standard.  The  type  and 
average  frequency  of  errors  attributable  to  the  operator  are:  a)  incorrect  data 
entry--51%;  b)  incorrect  message  format  selection— 18%;  c)  incorrect  switch  action- 
21%;  d)  excessive  time  to  complete  the  task— 10%. 

Analysis  of  the  timing  data  for  the  representative  soldiers  provided  an  empirical 
basis  for  examining  the  adequacy  of  the  hands-on  test  time  limit  standards.  An 
average  of  83%  of  soldiers  taking  SQT3  HOC  finished  the  hands-on  tests  within  given 
time  limit  standards;  92%  oi  those  taking  SQT4  HOC  finished  within  the  standards. 
Three  specific  criteria  were  used  to  judge  the  adequacy  of  each  hands-on  test  time 
standard:  a)  equals  approximately  double  the  average  time  required  by  representa¬ 
tive  soldiers;  b)  exceeds  the  upper  limit  of  the  range  of  task  completion  times; 
c)  equals  a  value  defined  by  training  developers  to  be  mission-related. 

3.3.4  Revisions .  Revisions  to  the  hands-on  test  materials  based  on  soldier  tryout 
data  included:  rewriting  portions  of  general  instructions;  including  SQT  HOC 
administration  time  guidelines  based  on  soldier  tryout  in  the  MAHOC;  correcting 
courseware  logic  errors;  adjusting  hands-on  test  time  standards. 

4.0  ADMINISTRATION  AND  MAINTENANCE  ISSUES 

4.1  Iss ues  in  Admi ni ste-i ng  the  SQT  HOC 

TRADOC  regulations  for  SQT  development  stress  the  importance  of  one  feature  that 
should  characterize  an  SQT--relati ve  ease  of  administration  (TRADOC  Reg  351-2). 

To  meet  this  criterion,  the  SQT  should  be  easy  to  administer  by  NCOs  at  the  E6 
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level  or  higher,  be  accompanied  by  comprehensive  and  simple  guidelines  for  test 
administrators,  and  require  only  the  personnel  and  equipment  resources  available 
at  a  typical  unit.  These  became  the  objectives  for  administering  the  TACFIRE 
embedded  SQT.  Issues  important  for  SQT  administrative  feasibility  include  qualifi¬ 
cations  of  test  administrators,  system  configuration  and  status,  test  security, 
and  disposition  of  test  results. 

With  the  MAHOC  as  a  reference,  administering  the  computer-based  SQT  is  designed  to 
be  a  turn-key  operation  for  the  test  administrator.  He  will  be  able  to  perform 
test  set-up  procedures  explained  in  the  MAHOC,  administer  the  SQT  to  soldiers,  and 
collect  the  examinee  test  results  with  no  prerequisite  computer  programming  ex¬ 
perience  or  outside  assistance. 

The  SQT  HOC  at  either  skill  level  3  or  4  can  be  administered  on  any  existing  TACFIRE 
system  configuration,  that  is,  battalion  or  division  artillery,  at  centralized 
locations  or  field  unit  locations.  The  SQT  materials  written  in  PLANIT  courseware 
are  transportable  via  standard  TACFIRE  input-output  medium--magnetic  tape  cartridge. 
The  load  procedures  are  comparable  to  those  performed  by  an  ACCO  for  TACFIRE 
tactical  system  software.  In  the  event  of  external  threat  or  other  emergency,  the 
TACFIRE  system  can  be  restored  to  operational  status  immediately  by  loading  tactical 
system  software. 

Test  security  for  the  computer-based  SQT  depends  on  safeguards  against  compromise 
by  those  who  are  not  authorized  to  administer  the  SQT  and  safeguards  against  un¬ 
authorized  modifications  to  the  courseware  The  PLAN!?  software  provides  for  both 
safeguards  since  log-in  identifications  used  by  operators  and  examinees  restrict 
their  access  to  only  those  functions  in  the  courseware  intended  for  their  use. 

For  the  computer-based  SQT,  the  scoring  procedure  will  differ  from  the  conventional 
approach.  The  TACFIRE  computer  will  perform  the  scoring  for  each  examinee,  tabulate 
test  results,  and  print  a  complete  HOC  Record  Sheet.  This  record  sheet  can  be 
transcribed  directly  to  the  SQT  marksense  form  used  by  the  training  standards 
officer  to  compile  test  results. 

4 . 2  Maintainability  Issues  and  Procedures 

There  are  four  general  levels  of  maintenance  for  the  computer-based,  embedded  SQT 
considered  important  for  an  overall  goal  of  ease  of  maintenance.  The  levels  of 
maintenance  are  listed  in  Table  4  ordered  from  specific  to  general,  together  with 
the  primary  objective  associated  with  each. 

The  aspects  of  TACFIRE  equipment  maintenance  relevant  to  actual  system  operation 
also  affect  SQT  operation.  In  the  event  of  equipment  malfunctions,  the  testing  is 
nalted  but  no  performance  data  are  lost  from  core  memory.  When  the  system  is 
restored,  PLANIT  automatically  resumes  HOC  testing  at  the  point  where  each  examinee 
was  stepped. 

Annual  revisions  to  ar,  SQT  mandated  by  7RAD0C  are  accommodated  by  a  large  base  of 
validated  hands-on  tests,  The  SQT  developers  need  only  select  a  different  subset 
of  hands-on  tests  each  year  to  optimally  represent  extant  job  duties.  This  selection 
requires  no  modi fi cations  to  the  courseware. 
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TABLE  4.  MAINTENANCE  OBJECTIVES 


Level  of  Maintenance 

Primary  Objective 

1.  Maintenance  for  TACflRE 
equipment  on  which  the 

SQT  HOC  is  artnlnistered. 

Ensure  test  reliability  by 
minimizing  equipment  failure 
and  down  time. 

2.  Annual  revisions  to  SQT  contents 
in  accordance  with  TRADOC 
regulations. 

Provide  flexibility  to  use 
developed  and  validated  hands-on 
tests  for  more  than  one  test  perioo. 

3.  Modifications  and  additions  to 
the  courseware  coraprising 
hands-on  tests. 

Provide  capability  to  implement 
modifications  and  additions  to  the 
courseware  interactively,  easily, 
and  rapidly. 

4.  Enhancements  ai«J  troubleshooting 
support  for  the  software  system. 

Provide  quality  assurance  for 
system  software. 

New  interface  requirements  and  operating  system  software  releases  will  drive  the 
training  and  testing  requirements  for  TACFIRE  and  any  evolving,  complex  system. 
Supplementing  the  TACFIRE  SQT  with  new  or  revised  test  materials  is  an  important 
capability  to  accoomodate  extensions  in  the  scope  of  the  SQT.  A  technician  with 
computer  progranming  experience  has  the  qualifications  to  edit  existing  courseware 
and  develop  new  courseware  for  the  TACFIRE  embedded  SQT.  The  more  common  types  of 
courseware  modifications  that  will  be  required  are  quite  simple  and  can  be  accom¬ 
plished  interactively  from  the  ACC  terminal  at  any  time. 

At  the  most  general  level  of  maintenance  for  the  computer-based  SQT  is  support  of 
any  enhancements  for  the  PLANIT  operating  system  software.  The  evidence  for 
maintainability  and  supportability  of  the  PLANIT  software  is  derived  from  many 
trouble-free  installations  of  PLANIT  nationwide,  spanning  a  wide  variety  of 
applications. 


5.0  CONCLUSIONS 

Conclusions  warranted  by  the  results  of  the  computer-based  SQT  development  and 
validation  may  be  relevant  both  as  an  overview  of  the  present  work  and  as  research 
and  development  issues  for  other  SQTs  designed  to  be  delivered  on  operational  tac¬ 
tical  data  systems.  First,  the  TACFIRE  embedded  SQT  appears  to  be  enthusiastically 
accepted  by  the  user  coirmunity.  The  extensive  involvement  of  subject  matter 
experts  and  TRADOC  personnel  during  the  development,  validation,  and  review 
process,  together  with  the  utveloper's  responsiveness  to  TRADOC  regulations  and 
user  input,  are  thought  to  be  important  contributors  to  this  acceptance. 

The  TACFIRE  application  of  computer-based,  embedded  HOC  testing  is  feasible  and 
effective  in  fulfilling  the  intent  of  the  Army  SQT  program.  Performance  of  hands-on 
tests  on  the  TACFIRE  equipment  closely  resembles  the  performance  of  actual  job 
duties.  The  TACFIRE  computer  accomplishes  objective  and  accurate  scoring  of 
hands-on  tests.  The  SQT  is  also  feasible  from  the  standpoint  of  administration 
and  maintenance,  requiring  no  progranming  experience  to  administer  and  no  other 
resources  or  equipment  than  those  available  at  the  typical  field  unit.  The  PLANIT 
courseware  materials  3re  structured  in  a  modular  format  and  are  easy  to  modify  and 
supplement. 
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Compliance  with  TRADOC  regulations  for  SQT  policy  and  procedures  by  the  computer-based 
SQT  does  not  differ  significantly  from  conventional  SQTs.  The  validation  plan, 
expert  tryout,  soldier  tryout,  data  analysis,  and  test  content  revisions  were  con¬ 
ducted  in  accordance  with  regulations.  The  MAHOC  closely  follows  the  TRADOC  re¬ 
commended  format.  Test  results  are  computer-generated  in  a  form  compatible  with 
TRADOC  SQT  scoring  procedures. 

PLANIT  has  been  a  successful  vehicle  for  development  of  the  TACFIRE  embedded  SQT. 
PLANIT  capabilities  and  advantages  include:  extensive  record  keeping  of  student 
performance;  analysis  of  incorrect  answers  to  provide  detailed  feec&ack;  trans¬ 
portability  of  courseware;  ease  of  courseware  modification.  The  success  of  PLANIT 
in  the  implementation  of  the  TACFIRE  embedded  SQT  recommends  it  as  a  candidate  for 
other  developments  on  computer-based  tactical  data  systems. 
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A  Simulated  Aircraft  Landing  Teat  as  a  Pilot  Selection  Test 


Pilot  selection  in  the  Canadian  Armed  Forces  has  been  a  continuing 
concern  especially  since  attrition  ^ates  for  trainees  has  been  as  high 
as  fifty  percent  on  a  fev  courses.  Stn  the  reported  study,  the  utility 
of  a  custom-developed,  microprocessor-driven  aircraft  landing  test 
(ALT),  was  examined  in  terms  of  its  added  value  to  the  current  pilot 
selection  battery.  A  CROKENCO  II  microcomputer  was  programmed  to 
simulate  the  landing  of  a  light  aircraft  where  visual  stimulation  was 
presented  on  a  CRT  and  the  candidates  used  a  "joystick"  and  throttle  to 
perform  three  landing  tests.  Several  dependant  measures  were  auto¬ 
matically  recorded  every  500  m  sec.  This  ALT  was  a  further  development 
of  earlier  research  conducted  on  another  system.  (Fowler,  1981). 

One  hundred  and  fifty  male  candidates  applying  for  military  flying 
training  were  tested  on  the  ALT  as  well  as  on  the  current  teat  battery 
which  consists  of  a  psychomotor  test  In  an  aviation  tester,  pencll- 
and-paper  tests  which  tape  verbal  and  quantitative  aptitudes  and  a 
memory  test.  The  candidates  also  completed  a  measure  of  cognitive 
style,  the  Croup  Embedded  Figures  Test,  and  selected  scales  from 
Jackson's  Personality  Research  Fora. 


It  was  found  that  performance  on  the  ALT  was  Independent  of  perfor¬ 
mance  on  the  current  test  battery  as  well  as  performance  on  the  Croup 
Embedded  Figures  Test  and  the  selected  scales  froa  Jackson's  Person¬ 
ality  Research  Form, 


Future  research  Into  the  use  of  the  ALT  as  a  selection  device 
will  take  place  once  trainees  have  completed  the  primary  flying 
training  course. 
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AN  INVESTIGATION  INTO  THE  USE  OF  AN  AIRCRAFT 
LANDING  TEST  (ALT)  IN 
AIRCREW  SELECTION 


Captain  P.B.  Lessard 

Canadian  Forces  Personnel  Applied  Research  Unit 
Suite  600,  4900  Yonge  Street 
Wiilovdale,  Ontario  M2N  6B7 


INTRODUCTION 


Background 


The  Canadian  Armed  Forces  (OAF)  has  a  continuing  need  to  improve  their 
Aircrew  Selection  Battery  (ASB)  to  ensure  its  predictive  efficiency.  The 
importance  of  psychomotor  skills,  (i.e.,  the  ability  to  co-ordinate, 
manipulate  and/or  repeat  precise  body  or  limb  movements)  and  to  respond  to 
visual  stimuli,  has  long  been  recognized  as  critical  for  success  as  a  pilot. 
The  United  States  Air  Force  (USAF)  in  a  study  of  perceptual  motor  and 
cognitive  performance  (Imhoff  &  Levine,  1981),  found  the  predictive  validity 
of  the  various  perceptual-motor  and  psychomotor  tests  used  in  pilot  selection 
to  be  the  most  significant. 

A  study  of  student  pilot  attrition  (Ring  &  Eddowes,  1976)  in  the  USAF 
Undergraduate  Pilot  Training  (UPT)  program  found  that  failed  students 
typically  listed  pre-solo  landing,  loss  of  confidence  and  final 
turn-approach-flare  as  areas  which  lead  to  their  subsequent  elimination  from 
pilot  training.  During  personal  interviews  with  Royal  Military  College  cadets 
who  had  no  previous  flying  experience,  Capt.  G.A.  VanDyke  (Memorandum.  March 
1931)  of  the  CAF  reported  that  10  of  the  24  failed  cadets  noted  chat  landings 
were  by  far  the  most  difficult  part  of  the  course.  These  results  indicate 
that  perhaps  an  aircraft  landing  test  might  help  identify  potential  failures 
at  the  selection  stage. 

Also,  the  CAF  is  in  the  rather  unusual  position  of  testing  and  selecting 
persons  for  pilot  training  from  different  cultural  and  language  groups  within 
the  country.  The  use  of  performance  casks  rather  than  pencil-and-paj er  tests 
could  help  eliminate  cultural  biases  which  are  often  found  in  translated  tests 

Hy pothes i s 

The  hypothesis  is  that  there  is  a  significant  positive  correlation  between 
performance  on  the  ALT  and  performance  on  the  ASB,  particularly  with  the 
Visual  General  Aviation  Tester  (VGAT). 


METHOD 


Sub  jects 


The  ALT  was  administered  to  a  sample  of  107  male  Anglophones  between  the 
ages  of  17  and  26,  all  applying  for  flying  training  in  the  Canadian  Armed 
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Forces.  None  of  the  applicants  had  previous  flying  experience.  The  education 
level  of  the  applicants  varied  from  at  least  a  Junior  Matriculation  (high 
school  graduates)  to  Direct  Entry  Officers  (DEOs),  those  with  a  college  or 
university  education.  All  applicants  had  been  prescreened  using  the  Canadian 
Forces  General  Classification  (CFCC)  test  and  had  undergone  an  initial 
interview  to  assess  officer  potential.  All  subjects,  whether  or  not  selected 
for  pilot  training,  were  used  in  the  analyses  of  the  ALT. 

Apparatus 

The  ALT,  driven  by  a  Cromemco  microcomputer  was  used  in  conjunction 
with  a  cathode  ray  tube  (CRT)  display  to  simulate  the  approach  and  landing  of 
a  light  aircraft.  The  throttle  was  positioned  in  front  of  the  control  column 
to  accommodate  both  left-  and  right-handed  persons.  An  adjustable  chair  was 
used  to  ensure  all  subjects  were  looking  directly  at  the  CRT.  Approach,  stall 
and  structural  damage  speeds  were  noted  on  the  left-hand  side  of  the  console 
as  reminders  to  the  subject. 

The  simulation  was  that  of  a  cross  (the  aircraft),  which  was  to  be 
flown  through  an  approach  to  a  safe  landing  using  as  instruments,  an  horizon 
bar,  airspeed  indicator  and  altimeter  which  were  controlled  by  throttle  and 
control  column  movements.  The  approach  commenced  at  900  feet  altitude,  2700 
feet  from  the  threshold  of  the  runway.  The  approach  was  terminated  when  a 
safe  landing,  a  stall  (air  speed  less  than  50  mph),  a  crash  (airspeed  in 
excess  of  75  mph  or  less  than  65  mph  on  landing)  or  structural  damage  occurred 
(speeds  in  excess  of  150  mph).  All  parameters  were  recorded  on  hard  copy 
using  a  Texas  Instruments  printer.  Figure  I  is  a  block  diagram  depicting  the 
CRT  and  accessories  which  are  included  in  the  design  of  the  ALT. 

Procedure 

Subjects,  who  are  processed  through  the  Aircrew  Selection  Centre 
(ASC),  are  selected  for  pilot  training  based  on  the  results  from  a  battery  of 
penci 1-and-paper  tests  and  a  psychomotor  test,  the  VGAT.  The  ALT  was 
administered  to  the  subjects  as  part  of  the  ASB  without  advising  them  it  was 
an  experimental  test  to  ensure  they  made  every  effort  to  do  as  well  as 
possible.  The  results  obtained  on  the  ASB,  or  the  ALT,  were  not  released  to 
the  subjects. 

The  administration  of  the  ALT,  which  took  approximately  30  minutes 
consisted  of  a  briefing  given  to  all  applicants  and  a  testing  period.  The 
briefing  consisted  of  an  outline  of  all  instruments  and  controls,  plus  a 
demonstration  of  all  controls,  their  function,  airspeeds  to  monitor  symbology 
and  warnings  that  could  appear  on  the  screen.  It  should  be  noted  that  the  ALT 
had  no  rudder  control,  and  all  turns  were  made  by  banking.  The  instructor 
also  demonstrated  a  successful  landing.  Subjects  were  tested  individually. 

At  the  conclusion  of  the  demonstration  the  subject  had  the  opportunity 
to  practise  co  ordinal ing  the  controls  and  flying  to  a  targeting  circle.  Once 
this  familiarization  exercise  was  complete  the  candidate  attempted  three 
landings  which  were  scored,  keyed  and  stored  on  disc. 
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RESULTS 


The  ALT  scores  were  recorded  as  successful  versus  failed  (stalled  or 
crashed)  a  dichotomous  variable.  In  addition  total  time  for  the  approach, 
time  outside  the  approach  cone,  time  outside  the  approach  speed,  and  distance 
from  the  runway  threshold  when  a  successful  landing,  crash  or  stall  occurred 
were  also  scored. 

There  was  no  correlation  between  the  number  of  successful  landings 
(mean  =  1.44,  standard  deviation  =  1.09)  and  scores  from  the  Aircrew  Selection 
Centre  battery  pencil-and-paper  tests.  However,  a  significant  relationship 
was  found  between  the  Visual  General  Aviation  Test  (VGAT)  and  the  ALT  average 
landing  accounting  for  approximately  four  percent  of  the  variance. 

Table  2,  a  t-test  using  extreme  groups  (those  with  zero  successful 
landings  and  those  with  all  three  landings  successful)  showed  that  there  is  a 
significant  difference  between  the  two  groups  on  one  test,  serial  addition. 

The  remaining  seven  tests  showed  no  significant  difference  between  the  two 
extreme  groups. 


DISCUSSION 


This  preliminary  study  failed  to  show  a  significant  positive 
correlation  between  scores  on  the  ALT  and  scores  on  the  ASC  battery, 
specifically  VGAT.  Since  both  the  ALT  and  VGAT  are  psychomotor  tests  it  was 
felt  there  should  be  some  relationship  between  the  two.  This  study  failed  to 
support  the  hypothesis.  One  of  the  major  problems  encountered  in  using  the 
AlT  as  a  testing  device  is  its  lack  of  fidelity  or  the  precision  with  which  a 
landing  could  be  reproduced  on  the  CRT.  Furthermore,  visual  feedback  to  the 
subject  (i.e.,  airspeed,  altitude,  attitude)  and  kinesthetic  feedback  (control 
column  and  throttle  movement)  are  far  from  the  quality  necessary  to  duplicate 
landing  a  light  aircraft.  In  general,  the  graphics  package  and  mechanical 
controls  require  considerable  work  prior  to  conducting  another  study. 
Specifically,  the  legibility  of  instructions  and  feedback  statements  detract 
from  the  tests  reliability  as  a  selection  device.  Before  dismissing  the  ALT 
out-of-hand  however,  flying  training  data  from  the  Canadian  Forces  Flight 
Training  School  will  be  obtained  and  the  ALT  scores  correlated  with  them. 
Therefore,  a  further  investigation  will  be  carried  out  in  early  1982,  when 
sufficient  numbers  of  candidates  have  completed  basic  flying  training  to 
further  examine  the  relationship  between  flying  training  and  the  ALT. 
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ABLE  1 


PEARSON  CORRELATIONS  BETWEEN  ALT  AVERAGE  LANDING 
AND  ASC  TEST  SCORES  FOR  THE  MALE  ANGLOPHONES 


ASC  TESTS 

PEARSON  i 

CORRELATIONS 

Numerical  Ability  (AB) 

.02 

(n-102) 

Verbal  Aptitude  (TV) 

.10 

(n-102) 

Arithmetic  Reasoning  (AN) 

.09 

(n-102) 

Reading  Comprehension  (TR) 

.05 

(n-102) 

Math  Reasoning  (AR) 

.08 

(n-102) 

Instrument  Interpretation  (WC) 

.16 

(n-102) 

Reading  Tables  (WT) 

.14 

(n-102) 

Serial  Addition  (AS) 

.16 

(n-102) 

VGAT  Grand  Total  Score 

.20* 

(n-10l) 

General  Classification  (GC) 

.05 

(n-  59) 

*  p  4  .05 
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TABLE  2 


T-TEST  ON  SERIAL  ADDITION  SCORES 
FOR  MALE  ALGLOPHONE  GROUPS  BASED  ON  ALT  PERFORMANCE 


VARIABLE 

NUMBER 

OF 

CASES 

MEAN 

STANDARD 

DEVIATION 

POOLI 

T 

VALUE 

:d  varianci 

DEG  OF 
FREEDOM 

5  ESTIMATE 

TWO-TAIL 

PROBABILITY 

SERIAL  ADDITION 

■ 

GROUP  1 

28 

33.79 

9.56 

D 

46 

£<..05 

GROUP  2 

20 

39.95 

8.44 

Group  l  -  extreme  group  which  did  not  successfully  land  the  ALT  at  all 

Group  2  -  extreme  group  which  successfully  landed  the  ALT  on  all  three  attempts 


FIGURE  I 
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SUMMARY 


'The  paper  describes  the  results  of  a  study  designed  to  evaluate  the  utility  of 
a  flight  selection  test  for  Royal  Air  Force  pilot  candidates.  53  pilot 
applicants  acted  as  subjects  in  a  controlled  trial,  based  on  a  14  hour  light 
aircraft  course.  Tests  were  marked  in  the  air  ct  9  and  14  hours  by  independent 
exaniners.  When  all  of  the  students  had  completed  the  RAF  Basic  Flying 
Training  course  one  year  later,  their  tests  results  were  compared  with 
training  outcomes.  The  results  indicated  a  very  high  relationship  between 
flying  test  marks  and  probability  of  success  in  later  training.  The  marks 
awarded  by  the  examiners  were  more  predictive  than  the  assessments  of  the 
flying  instructors.  Based  on  the  results  of  the  trial,  the  RAF  established  a 
Flying  Selection  Squadron  and  the  paper  presents  some  data  from  follow-up 
studies  which  were  carried  out  as  a  validation  of  the  selection  procedure.  . 

INTRODUCTION 


Earlv  in  1942,  the  Royal  Air  Force  (RAF)  introduced  a  change  in  its  method  of 
selecting  pilot  cadets  for  pilot  training.  The  modified  plan  involved 
systematic  assessment  of  flying  performance  after  a  limited  amount  or 
standardised  dual  flying  instruction. 

The  introduction  of  this  more  analytical  and  objective  approach  was  a  radical 
departure  from  the  previous  selection  methods  which  were  based  largely  on  a 
general  subjective  impression  of  a  candidate's  suitability. 

This  method  of  selecting  potential  pilots  from  aircrew  candidates  on  the 
results  of  a  standard  'flight  test'  procedure  came  to  be  known  in  the  RAF  as 
Gradi  ng . 

The  introduction  of  Grading  as  a  method  of  pilot  selection  in  the  RAF  was,  in 
part,  due  to  the  transfer  early  in  1942  of  the  bulk  of  pilot  training 
overseas.  The  war  situation  at  that  time  made  it  imperative  that  only  cadets 
who  gave  vary  hi promise  of  successful  training  should  occupy  shipping 
space,  (Air  Ministry  Report  1945). 


*  Mote  : 

Any  v-ieics  expressed  ai c  t/ioie  oj(  the.  autko-U  and  do  not  neceA&oAtty  fiepfiei ant 
tlwie  o{\  the  AOC-ui-C  KA  F  Sup  pc  it  Command  va  the.  Cii-teg  ScxenTx.it  (RAF). 
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As  a  result  of  the  introduction  of  Grading,  attrition  rates  at  all  stages  of 
training  were  reduced  significantly,  particularly  at  the  Elementary  Flying 
Training  School  ^EFTS).  The  pilot  aptitude  ratings  from  the  Grading  tests 
showed  a  high  relationship  not  only  with  probability  of  failure  but  also  with 
the  more  discriminating  criterion  of  proficiency  assessments. 

The  post-War  decline  in  the  requirenent  for  aircrew  led  to  a  shift  in  the 
selfcCtior  requirements  of  the  RAF.  The  relatively  expensive  procedure  of 
flight  Grading,  which  had  been  justified  by  the  Wartime  requirement,  was  no 
longer  deemed  suitable  to  the  needs  of  a  contracting  Service.  For  many  years, 
aircrew  for  the  RAF  were  selected  using  a  traditional  pilot  aptitude  test 
battery. 

Grading  was  re-introduced  briefly  in  the  early  '50s,  with  civilian  pilots 
acting  as  instructors  and  testing  officers  under  contract  to  the  Government. 
However,  poor  standardisation  and  commitment  on  the  part  of  the  staff  of 
Grading  schools  resulted  in  unreliable  testing  practices  and  the  scheme  was 
abandoned  in  favour  of  the  less  expensive  and  more  reliable  aptitude  battery, 
(Ministry  of  Defence  Publication,  19691. 

THE  PRIMARY  FLYING  GRADING  TRIAL 


In  1973,  the  RAF  introduced  an  all-jet  flying  training  system;  as  a  result, 
the  Primary  Flying  School  (PFS),  which  had  previously  provided  initial  flying 
training  using  a  Chipnunk  aircraft,  was  closed.  This  change  in  the  training 
sytsem  came  at  a  time  of  great  concern  over  the  rising  attrition  rates  in  the 
Basic  Flying  Training  Schools  (BFTSs)  and  an  equal  concern  over  reports  of 
falling  predictive  validity  of  the  pilot  aptitude  tests.  The  net  effect  of 
these  concerns  was  that  an  experimental  flying  unit  was  formed  with  the 
objective  of  testing  the  concept  of  reintroducing  some  form  of  flying 
selection. 

The  Primary  Flying  Grading  Squadron  (PFGS),  as  the  unit  was  called,  cane  into 
being  at  RAF  Church  Fenton  in  Yorkshire  on  6  May  1974.  The  squadron  was  formed 
from  the  disbanded  PFS  and  was  able  to  capitalise  on  the  availability  of 
experienced  flying  instructors  and  support  services  of  tie  old  flying 
school. The  unit  consisted  of  6  Qualified  Flying  Instructors  (QFls),  4  as 
instructors  and  2  as  examiners,  and  5  Chipmunk  aircraft. 

The  aim  of  the  PFGS  was  to  test  the  feasibility  of  using  a  short,  light 
aircraft,  flying  course  as  a  predictor  of  later  success  in  RAF  flying  training. 

The  PFGS  syllabus  was  developed  largely  along  the  lines  of  the  Canadian  Armed 
Forces  Primary  Flying  Grading  Squadron,  3  CFFTS,  at  Portage  La  Prairie.  Inputs 
were  also  made  from  the  Federdal  German  Air  Force  flying  selection  squadron  at 
Furstenf eldbruk  and  the  RAF  Central  Flying  School. 

The  ground  school  programme  of  the  course  was  designed  to  give  the  students 
what  was  considered  to  be  the  minimim  knowledge  of  airmanship  and  aerodynamics 
necessary  to  complete  the  flyinq  syllabus.  In  addition,  each  flying  exercise 
had  an  associated  reading  assignment  which  the  student  had  to  complete  before 
attending  the  relevant  fre-flight  briefing. 
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The  flying  syllabus  laid  great  emphasis  on  'attitude'  flying,  high  workload 
and  maximum  handling  of  the  aircraft  by  the  student  from  an  early  stage.  To 
this  end,  traditional  basic  handling  exercises  were  compressed  and  only  taught 
in  the  wav  in  which  thev  were  to  be  applied  in  the  tests.  The  course  was  15 
flying  hours,  including  a  flexible  allowance  of  one  hour  of  non-instructional 
flying  hour  per  student. 

Instructor  standardisation  was  rigorously  pursued  before  and  throughout  the 
trial  using  mutual  check  rides.  Examiner  standardisation  was  an  even  more 
difficult  task  as  the  pilots  had  to  familiarise  themselves  with  a  complex  test 
booklet  which  was  required  to  be  marked,  in  the  air,  during  test  flights. 
Several  weeks  of  training  were  needed  before  the  examiners  were  able  to  reach 
acceptable  levels  of  inter-rater  reliability  and  checks  were  made  continuously 
throughout  the  trial  period. 

Social  interaction  between  the  examiners  and  the  students  was  kept  to  a 
minimim  prior  to  testing  and  the  examiners  were  not  permitted  to  see  student 
training  records.  In  this  way,  it  was  hoped  that  the  tests  would  not  be  biased 
by  personal  likes  and  dislikes  or  knowledge  of  a  students  successes  or 
failures  during  the  instructional  phase  of  the  course.  The  tests  were  designed 
to  be  as  objective  as  possible  and  for  much  of  the  flight  the  examiner  acted 
as  a  human  fli^it  data  recorder,  noting  airspeed,  bank  and  heading  deviations. 

The  trial  took  almost  a  year  to  complete  and,  during  this  time,  53  students 
were  tested.  Two  tests  were  administered,  one  at  9  hours  and  one  at  14  hours. 
All  results  were  passed  directly  to  the  Research  Branch  at  HQTC  as  they  became 
available  and  they  were  kept  secret  until  the  last  of  the  trial  students  had 
completed  the  RAF  IOC  hour  Basic  Flying  Training  School  (BFTS)  course.  No 
students  were  failed  as  a  result  ov  taking  part  in  the  trial  and  their  marks 
were  never  passed  to  their  flying  units  nor  were  they  used  to  make  any 
decisions  regarding  their  careers. 

Figure  1.  shows  the  predictive  accuracy  of  the  assessments.  The  white  columns 
show  the  biserial  correlations  of  the  cunulative  subjective  marks  which  were 
awarded  by  the  QFIs.  The  shaded  columns  show  the  biserial  correlations  of  the 
test  marks  alone. 


FIGURE  1.  PREDICTIVE  VALIDITIES  QF  INSTRUCTOR  ASSESSMENTS  AND  TEST  WRKS 
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It  can  be  seen  that,  as  the  course  progressed  from  sortie  1  to  sortie  14,  the 
predictive  accuracy  of  the  assessments  increased,  although  there  was  seme  sign 
that  they  were  levelling  out.  The  tests,  however,  show  a  significantly  higher 
level  of  prediction  at  the  same  point  in  the  course  than  the  cunulative 
assessments  of  the  instructors  after  many  hours  of  close  contact  with  the 
students. 

At  the  end  of  each  course,  the  instructors  and  examiners  compared  notes  for 
the  first  time  and  decidee  on  an  overall  Grade  for  each  student  on  a  scale 
from  1  to  10;  it  was  nominally  decided  that  level  5  should  represent  the 
dividing  line  between  predicted  success  and  failure. 

The  Grade  was  based  on  a  consensus  opinion  about  the  likelihood  of  success  in 
BFTS.  Although  the  Grades  were  largely  influenced  by  the  test  marks,  there 
were  some  occassions  where  borderline  cases  were  decided  on  the  basis  of  the 
instructors’  knowledge  about  the  personal  suitability  of  the  student  for 
training  within  the  RAF  or  other  overriding  reasons  why  the  test  marks  should 
be  modified,  such  as  airsickness  or  poor  continuity. 

Figure  2.  below  shows  the  distribution  of  the  final  Grades  for  the  53 
students.  The  circles  coloured  black  represent  those  who  failed  subsequent 
Basic  Flying  Training.  It  can  be  seen  that  the  overall  Grades  gave  a  very 
accurate  assessment;  IS  students  had  been  predicted  as  failures  and,  by  the 
end  of  the  following  year  16  had  Indeed  failed.  31  out  of  the  35  predicted 
successes  subsequently  passed  Basic  Flying  Training. 


Predicted  as  successes  8  0000000 

7  000000000000 

6  mtoooooooooooo 


Predicted  as  failures 


0  =  Students  who  passed  training, 
f  =  Students  who  failed  training. 


FIGURE  2.  DISTRIBUTION  OF  GRADES  SHOWING  THOSE  STUDENTS  WO 


Part  o *  the  rationale  for  introducing  a  flying  selection  trial  at  this  time 
was  that  the  pilot  aptitude  tests,  then  in  use  at  OASC,  were  unable  to  make 
sufficiently  accurate  discriminations  between  the  pilot  applicants  to  ensure 
low  attrition  rates  at  the  RAF  Basic  Flying  Training  Schools.  The  validation 
of  the  flying  selection  tests  rested  therefore,  in  part,  on  the  comparison  of 
the  predictive  accuracy  of  the  two  measures.  In  particular,  it  was  hoped  that 
the  flying  tests  would  complement  the  existing  selection  methods  by  showing  a 
low  correl ati  on  with  the  pilot  aptitude  battery. 
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Figure  3.  below  shows  the  distribution  of  pilot  aptitude  scores  for  the  53 
trial  students.  The  low  predictive  accuracy  of  the  pilot  aptitude  scores  at 
this  end  of  the  scale  is  evident.  Although  the  pilot  aptitude  tests  had  almost 
certainly  eliminated  many  totally  unsuitable  candidates  who  scored  at  the 
lower  end  of  the  complete  aptitude  scale,  it  was  clear  that  further 
discrimination  was  not  possible  within  the  remaining  applicants  without 
further  testing.  The  flying  selection  tests  proved  that  this  finer  level  of 
discrimination  was  possible. 
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FIGURE  3.  RELATIONSHIP  BETWEEN  PILOT  APTITUDE  SCORES  AND  SUCCESS 
IN  BASIC  FLYING  TRAININg 

Following  the  publication  of  the  results  of  the  Primary  Flying  Grading  Trial, 
extensive  cost-benefit  analyses  were  carried  out  to  determine  the  effect  of 
introducing  flying  selection  into  the  RAF.  A  computer-based  manpower  and  cost 
model  was  devised  specifically  for  this  analysis.  The  model  was  able  to 
provide  accurate  cost  and  resources  figures  which  were  used  to  assess  the 
inpact  of  adding  an  extra  stage  in  the  selection  and  training  system  of  the 
RAF. 


The  model  was  also  used  for  to  assess  alternative  solutions  to  the  pilot 
selection  and  classification  problem,  such  as  simulator  selection  or 
improvements  in  the  pilot  aptitude  battery. The  analyis  showed  that  a  flying 
selection  system  would  be  highly  beneficial  in  terms  of  both  costs  and 
manpower,  particularly  when  it  is  used  to  supplement  existing  testing  and 
training  procedures. 

It  had  beer,  known  for  some  time  that  students  who  entered  the  RAF  with  more 
than  30  h-v-s  of  previous  flying  experience  had  a  higher  chance  of  success 
than  '  ab  initio1  pupils.  The  Flying  Grading  Squadron  had  little  to  say  about 
such  candidates,  and  in  addition  was  not  designed  for  them.  It  was  therefore 
decided  to  eliminate  the  need  to  test  such  candidates  further  after  initial 
pilot  screening  at  the  Officer  and  Aircrew  Selection  Centre. 

Students  who  enter  the  RAF  through  the  University  Cadetship  sytem  receive 
flying  training  on  a  Bulldog  aircraft  during  their  college  years.  These 
students  were  also  considered  unsuitable  candidates  for  additional  flying 
selction . 


Having  established  how  flying  selction  could  be  most  profitably  employed 
within  the  constraints  of  the  RAF  training  and  selection  system  a  decision  was 
made  to  establish  Flying  Selection  a45  a  filter  for  candidates  with  less  than 
30  hours  of  Drevious  flying  experience  who  were  not  Uni versity  Cadets . 
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THE  RAF  FLYIHG  SELECTION  SQUADRON, 

The  RAF  flying  Selection  Squadron  (FSS)  was  established  in  September  1979  at 
RAF  Swynderby,  using  Chipmunk  aircraft  which  had  been  in  storage  for  such  an 
eventuality.  The  procedures  of  the  FSS  are  largely  those  used  during  the  trial 
with  some  modification  to  the  test  content  as  a  result  of  detailed  item 
anal ysis . 

The  course  is  scheduled  for  6  weeks  and  comprises  10  or  11  candidates.  The  14 
hour  syllabus  has  test  points  at  the  9th  and  14th  exercise,  each  marked  by  a 
different  examiner;  there  is  an  instructor  change  after  the  first  test.  The 
flving  instructors  use  subjective  assessnents  to  describe  the  students' 
performance  after  each  exercise  and  at  the  end  of  each  phase  of  the  course, 
tha  instructor  makes  a  prediction  of  the  candidates'  probability  of  success  in 
later  flying  training. 

The  tests  are  marked  in  the  air  by  independent  examiners.  The  various  items  in 
the  test  are  weighted  and  combined  to  produce  an  overall  score.  The  examiners 
also  give  a  subjective  rating  of  the  students'  performance. 

The  decision  to  either  pass  or  fail  a  student  is  taken  as  a  result  of  a 
Disposal  Board  meeting  which  is  convened  aa  soon  as  possible  after  the 
completion  of  a  course.  The  Board  is  Chaired  by  the  Group  Captain  responsible 
for  RAF  Flying  Training  at  the  Command  Headquarters. 

The  executive  members  of  the  Board  are  the  Officer  Commanding  the  FSS  and  the 
'  examiners.  Advisory  memhers  include  all  of  the  flying  instructors  and  a 
Comnand  Research  Branch  ps  vchologist.  The  Board  has,  in  attendence,  a 
representi ti ve  from  the  Command  Headquarters  Personnel  staff  to  advise  on 
alternative  carreer  placement  for  the  faileu  students. 

The  Board  notes  the  subjective  assessments  given  throughout  the  course  and 
takes  into  account  anv  special  considerations  such  as  sickness,  personal 
problems,  poor  flying  continuity  or  adverse  weather  during  a  test  sortie. 
Initial  subjective  Grades  are  recorded  and  then  compared  with  the  objective 
rank  order;  at  this  time  any  anomalies  between  the  two  are  resolved.  In  this 
wav,  a  mixture  of  objective  and  subjective  assessments  produces  a  final  FSS 
Grade  in  which  neither  elenent  overwhelmingly  dominates  the  other. 

At  this  stage  anv  anomalies  between  the  test  marks  and  the  subjective  opinions 
are  resolved.  The  Chairman  acts  as  the  final  arbiter  for  all  borderline  cases. 

FSS  has  now  been  r,jnninq  for  ?  years,  during  which  time  352  candidates, 
comprising  34  courses,  have  been  processed.  83  of  these  candidates  did  rot  go 
on  to  Basic  Flying  Training;  th’s  *igure  includes  8  students  who  withdrew  from 
the  system  at  their  own  request.  Thus,  the  overall  FSS  rejection  rate  is 
approximately  25%.  The  present  policy  sets  the  minimum  acceptable  Grade  at  6 
although  the  Chairman  still  retains  the  authority  to  pass  students  with  a 
Grade  of  5  provided  that  he  is  $atisfied  that  the  training  risk  is 
justifi  able. 

To  date,  some  19  FSS  courses  have  completed  BFT  and  this  sample  of  153 
students  is  suffi ci'ent1  v  large  to  indicate  that  FSS  is  performing  well. 


:1  c 


Distribution  of  FSS  Grades.  Figure  4.  below  compares  the  FSS  Board  Grades  with 
overall  BFT  results.  The  overall  level  of  prediction,  as  estimated  from  these 
data  is  "7~y  much  in  accordance  with  expectations  based  on  the  trial  results. 
Clearlv  one  would  expect  some  lowering  of  prediction  with  a  real  world  system 
due  tj  the  problems  in  maintaining  high  reliability  of  assessments  and  drift 
’n  tne  marking  standards,  but  the  results  so  far  are  most  encouraging. 


10 

9 

8 

7 

6 

5 

4 

3 

2 

1 


00 

OOOOOOOOO 

•••♦CKXXXX>X)CKXK)OCKX)00OCKXKX)0O00C>OCKXXXXXX)OOOOO0 
HW<HHIIMIHMOOOOOOQOOOOOOOOOQOOOOOOOOOOOOOOOOOOOOOO 
XXXXMt— tHiWOOOOOOOOOOOOO 

xxxxxxxxxxxxxxxxxxxxxxxxxxoo 

XXX 
XXX 


0  -  Students  who  passed  BFTS 
•  -  Students  who  failed  BFTS* 
X  -  Students  rejected  at  FSS 


*failure  for  airwork  only. 

FIGURE  4.  THE  RELATIONSHIP  BETWEEN  FLVIN6  SELECTION  GRADES  AND 

RESULTS  aF  BASIC  flying  SchODl 


Effects  of  FSS  on  BFTS  wastage  rates.  Table  1.  below  shows  the  Basic  Flying 
Training  School  wastage  rate  before  and  after  the  introduction  of  the  Flying 
Selection  test.  The  results  of  this  initial  validation  are  most  encouraging, 
and  reflect  closely  the  predictions  of  the  computer  model  of  the  selection  and 
training  system  referrred  to  earlier. 


Table  1. 

Basic  Flying  Training  School 
suspension  rate.* 

Before  the  introduction 
of  Flying  Selection 

23X 

After  the  introduction 
of  Flying  Selection 

16.  IX 

*  Note:  these  rates  only  apply  to  none  University  cadets  with  less  than  30 
hours  flying  experience  before  joining  the  Service.  The  overall  BFTS 
suspension  rate  is  different  from  these  figures. 


Re^tionship  between  FSS  Grades  and  Pilot  Aptitude  Assessments.  All  candidates 
who  enter the  FSS  have  been  previously  screened  using  the  Pilot  Aptitude 
battery  at  the  Officer  and  Aircrew  Selection  Centre  at  RAF  Biggin  Hill.  The 
Pilot  Aptitude  index  is  a  composite  score  of  paper  and  pencil,  complex 
co-ordination  and  other  tests.  Figure  5.  below  shows  the  relationship  between 
FSS  Grades  and  the  Pilot  Aptitude  index  of  the  candidates. 
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0  -  Students  who  passed  BFTS 
•  -  Students  who  failed  BFTS* 
X  -  Students  rejected  at  FSS 
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*  failure  for  airwork  only. 

FIGURE  5.  THE  RELATIONSHIP  BETWEEN  PILOT  APTITUDE  SCORES 

i  -~rer~ 


It  can  been  seen  that  more  FSS  rejections  and  BFTS  airwork  failures  now  occur 
amongst  those  students  with  aptitude  scores  of  120  and  below.  This  result  is 
in  line  with  the  improved  level  of  prediction  from  the  current  pilot  aptitude 
scores  as  a  result  extensive  test  development.  However,  in  spite  of  this 
improvement  in  the  aptitude  testing  battery,  FSS  continues  to  have  an 
add’'tional  disciminating  effect  as  shown  by  Figure  6  below  which  illustrates 
that  the  Flying  Selection  testing  procedure  is  identifying  potential  BFTS 
failures  across  the  full  range  of  aptitude  scores.  BFTS  failures  have  not  been 
included  in  this  Figure  as  many  of  the  students  are  still  under  training. 
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FIGURE  6.  RELATIONSHIP  BETWEEN  PILOT  APTITUDE  SCORES  AND  FLYING 


Student  retention  and  manpower  implications.  The  present  RAF  Officer  and 
Aircrew  Selection  Center  tests  make  an  Initial  division  of  aircrew  applicants 
into  pilots  and  navigators  as  a  result  of  their  performance  on  specific 
selection  criteria.  An  alternative  option  would  be  to  make  this  placement 
decision  after  Flying  Selection  and  this  option  is  currently  under 
investigation. 

Effects  on  navigator  targets,  standards  and  morale  will,  of  course,  have  to  be 
taken  into  account,  but  the  high  transfer  rate  of  rejected  FSS  candidates 
gives  cause  for  some  optimism.  Out  of  88  FSS  rejections  so  far,  only  6  have 
left  the  Service;  a  remarkable  93 X  retention  rate.  The  majority,  60,  were 
transferred  to  navigator  training;  13  went  to  ground  branches  and  disposal 
action  for  the  remaining  9  is  still  being  considered. 

It  would  appear  that,  having  obtained  their  coinnissions  and  put  some  useful 
Service  experience  behind  them,  rejected  FSS  candidates  are  motivated  to 
remain  in  the  Service,  even  if  they  had  originally  expressed  the  intention  of 
being  a  pilot  or  nothing.  This  tends  to  support  the  view  that  Flying  Selection 
should  be  established  after,  rather  than  before  Initial  Officer  Training. 


CONCLUDING  REMARKS 


Since  the  end  of  the  second  World  War,  an  enormous  amount  of  time  and  effort 
has  been  expended  in  the  search  for  improved  pilot  selection  tests.  However, 
despite  this  considerable  and  continuing  research  investment,  it  has  been  said 
that  the  major  advances  in  testing  in  the  last  25  years  have  been  in  the  area 
of  statistical  methodology  rather  than  in  test  content  (North  and  Griffen, 
1977). 

It  is  also  worth  noting  that  the  development  of  tests  of  high  predictive 
validity  is  not,  in  itself,  sufficient  for  them  to  be  included  in  a  pilot  test 
battery.  As  an  example,  in  1951  the  US  Air  Force  gave  up  the  use  of 

psychomotor  tests  in  selection  in  spite  of  their  unique  contribution  to 

selection  validity  (Cronbach  1970).  Since  that  time  many  other  selection 

measures  which  have  demonstrated  high  predictive  validity  have  been  excluded 
from  pilot  selection  batteries  because  of  administrative,  reliability  and 

quality  control  problems. 

While  there  was  little  doubt  that  selection  tests  based  on  light  aircarft 
performance  were  able  to  produce  the  necessary  levels  of  predictive  validity, 
the  development  of  a  practicable  and  reliable  test  format  was  a  considerab’e 
challenge. 

In  spite  of  the  complexity  of  the  flying  environment,  the  test  methods  used  in 
the  Primary  Flying  Grading  Trial  in  1974  produced  very  high  rater  reliability. 
This  finding  is  in  line  with  similar  studies,  such  as  the  development  of  the 
'Illinois  Private  Pilot  Flight  Performance  Scale'  (Povenmire,  Alvares  and 
Damos  1970)  in  which  high  observer-observer  reliability  was  observed  in  a 
student  grading  test  situation. 
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The  justification  for  introducing  Flying  Selection  into  the  RAF  was,  however, 
based  on  a  very  detailed  analysis  of  the  costs,  benefits  and  manpower 
implications  of  multi-stage  selection  systems.  Without  a  detailed  front-end 
analysis  of  this  kind  the  full  benefits  of  Flying  Selection  may  not  have  been 
apparent.  Flying  Selection  is,  after  all,  an  expensive  form  of  testing,  and  it 
is  not  suitable  or  appropriate  for  all  candidates.  However,  it  is  also  highly 
predictive  and,  so  far,  reliable  and  administratively  convenient. 

Many  questions  remain.  Why  is  it,  for  example,  that  students  who  have  flown 
before  joining  the  Service  do  so  much  better  in  training?  It  is  clear  that 
they  have  skills  which  will  be  of  benefit  during  their  military  flying 
training  but  the  research  evidence  tends  to  suggest  that  the  important 
difference  may  lie  in  their  motivation  towards  a  flying  career. 

Flying  Selection  provides  an  opportunity  for  the  Air  Force  to  take  a  closer 
look  at  the  abilities  of  candidates  before  entering  them  into  a  very  costly 
training  system.  It  also  allows  the  candidates  with  no  experience  of  flying  to 
assess  their  own  abilities  and  motivation  within  a  supportive  environment. 

After  two  years  of  using  Flying  Selection,  the  RAF  has  proved  that  the  use  of 
standardised  flying  tests  can  add  a  level  of  discrimination  to  the  pilot 
selection  process  which  is  not  obtainable  in  any  other  way.  Maintenance  of 
reliability  will,  undoubtedly,  be  the  greatest  challenge  in  the  years  to  come 
but,  based  on  our  current  experience,  the  rewards  will  justify  the  effort. 
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Gender  Differences  in  the  Aircraft  Maintenance  Career  Field 

( 

Recently,  the  Air  Force  has  experienced  a  substantial  increase 
in  the  number  of  women  working  in  the  non-traditional  mechanics  career 
areas.  To  investigate  any  differences  in  the  job  expectations,  experi¬ 
ences,  and  attitudes  of  males  and  females  working  as  Air  Force  aircraft 
mechanics,  a  questionnaire  was  administered  to  800  women  and  1600  men 
in  the  aircraft  maintenance  career  field.  Items  were  included  in  the 
questionnaire  to  assess  why  the  individual  entered  the  career  field, 
past  interest  and  experience  in  mechanics,  expectations  about  the  job 
before  entering  the  career  field,  experiences  since  entering  the  career 
field,  and  attitudes  toward  the  the  job,  the  Air  Force  and  reenlist¬ 
ment.  The  responses  to  the  questionnaire  were  analyzed  for  overall 
male/female  differences,  male/female  differences  within  job  types  in 
the  career  field  and  the  relationships  among  different  experiential 
factors  and  attitudes,  with  specific  emphasis  on  reenlistment  intent. 
Results  are  discussed  in  terms  of  gender  differences,  job  type  dif¬ 
ferences  and  with  respect  to  retention  probabilities . 
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Gender  Differences  in  the  Aircraft 
Maintenance  Career  Field 


M.  Suzanne  Lipscomb 


Air  Force  Human  Resources  Laboratory 
Brooks  Air  Force  Base,  TX  78235 


Background 

During  the  past  decade,  the  number  of  women  serving  in  the  Air  Force  has 
increased  from  approximately  12,000  to  more"  than  60,000.  This  has  taken  place 
while  the  total  active  duty  force  strength  has  declined  by  approximately  30X. 
There  are  now  over  20,000  women  working  in  job  specialties  which  were  once 
considered  traditionally  male  jobs,  reflecting  a  substantial  shift  in  the 
distribution  of  sexes  across  jobs.  For  example,  the  Air  Force  currently  has 
more  women  working  in  aircraft  maintenance  than  in  personnel  specialties. 

The  substantial  increase  in  numbers  of  women  and  their  assignment  into 
many  traditionally  all-male  career  fields  has  made  it  increasingly  important 
for  the  Air  Force  to  have  detailed  management  information  concerning  the 

characteristics  of  successful  and  unsuccessful  women,  the  expectations  and 
attitudes  of  women,  and  the  performance  and  utilization  patterns  of  women  in 
the  operational  environment. 

While  women  are  assigned  to  many  different  technical  specialties,  there 
are  at  present  1,700  women  in  the  Tactical  Aircraft  Maintenance  and 

Airlift/Bombardment  Aircraft  Maintenance  specialties,  comprising  approximately 
UX  of  the  total  population  of  these  two  specialties.  The  large  number  ot 
women  in  these  tradit ional ly  male  career  fields  provides  an  excellent  medium 
for  the  comprehensive  study  of  women  working  in  non-traditional  areas. 

Bergmann  &  C'nristal  (1978)  reported  a  preliminary  study  investigating  the 
utilization  of  women  in  the  Aircraft  Maintenance  career  field.  Using 
occupational  data  collected  by  the  Air  Force  Occupational  Measurement  tenter 
(AFCMC),  a  job-type  analysis  and  an  analysis  ot  aptitude  distributions  were 
conducted.  Even  though  the  data  had  not  been  gathered  to  investigate  the 
utilization  patterns  of  women  and,  the  sample  of  females  was  small,  the  study 
did  produce  some  interesting  findings. 

Results  from  this  study  indicated  that  there  were  significant  differences 
in  task  assignment  as  a  function  of  gender.  Within  the  same  specialty,  a 
higher  percentage  of  the  males  were  found  to  be  doing  actual  maintenance  tasks 
while  a  higher  percentage  of  females  were  doing  support  tasks.  The  data  also 
suggested  that  during  the  first  enlistment  there  was  a  movement  of  individuals 
from  maintenance  to  support  tasks.  However,  the  extent  of  this  movement 
appeared  to  be  much  larger  for  females  than  males.  While  few  differences  were 
found  in  the  tasks  performed  by  males  and  females  working  in  maintenance 
functions,  differences  were  found  in  the  tasks  performed  by  men  and  women 
working  in  support  functions.  Women  spent  more  oi  their  time  than  aid  men 
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performing  administrative  or  clerical  tasks.  These  tasks  were  rated  by 
supervisors  as  being  more  difficult  than  those  tasks  performed  by  men  in 

either  maintenance  or  support  jobs.  Finally,  no  significant  differences  were 
found  in  job  attitudes  of  males  and  females  in  either  support  or  maintenance 
functions . 

The  analysis  of  aptitude  distributions  conducted  in  this  study  suggested 
that  the  mechanical  aptitude  requirements,  used  for  initial  classification  and 
assignment  and  historically  predictive  of  success  for  males,  might  not  be 

appropriate  for  females.  Many  of  the  women  in  the  sample  did  not  qualify  for 
entrance  into  the  career  field  with  their  scores  on  the  mechanical  aptitude 
test  but  were  admitted  into  the  career  field  during  the  period  when  entry  was 
allowed  based  on  either  mechanical  or  electronic  aptitude  scores. 
Nevertheless  these  individuals  had  graduated  from  technical  training  school 
and  were  functioning  in  the  career  field  and  performing  maintenance  jobs. 
However,  no  data  were  available  on  their  performance  levels  or  on  individuals 
no  longer  in  the  specialty. 

The  findings  of  this  exploratory  research  indentified  the  necessity  of 
obtaining  data  from  a  larger  and  more  representative  sample  of  males  and 
females  in  these  specialties.  These  data  would  include  information  on  che 

individuals  entering  the  career  field,  those  leaving  the  career  field,  and  the 

jobs,  tasks,  attitudes,  and  performance  levels  of  those  currently  in  the 
career  field.  Information  obtained  in  these  areas  would  provide  Air  Force 
management  with  a  comprehensive  picture  of  the  utilization  of  both  women  and 
men  in  the  aircraft  maintenance  career  field  and  allow  planning  to  insure  the 
optimum  return  on  personnel  investments.  The  Manpower  and  Personnel  Division, 
Air  Force  Human  Resources  Laboratory  (AFHRL),  has  initiated  a  multi-year 
research  program  designed  to  assess:  1)  the  characteristics  of  the  aircraft 
maintenance  career  field  input  population;  2)  the  characteristics  of  those 
leaving  this  career  field;  3)  the  on-the-job  utilisation  patterns  within  the 
career  field;  4)  job  expectations  and  attitudes  of  career  field  incumbents; 
and  5)  the  on-the-job  performance  of  personnel  in  this  career  field.  This 
paper  will  presents  the  preliminary  findings  of  this  research  program 
concerning  the  job  expectations,  experiences  and  attitudes  of  men  and  women  in 
the  Aircraft  Maintenance  career  field.  Investigation  in  the  other  areas  of 
interest  is  continuing  and  results  will  be  reported  at  a  future  time. 

Method 


Survey  Construction 


In  April  1980,  an  occupational  attitude  and  experience  survey  was 
administered  in  conjunction  with  the  routine  occupational  survey  of  the 
Aircraft  Maintenance  career  field  conducted  by  the  Air  Force  Occupational 
Measurement  Center.  The  survey  battery  contained  a  background  section  in 
which  job  incumbents  provided  demographic  information  about  themselves,  their 
job  satisfaction,  reenl istment  intent,  TDY  experience,  shift  work  and 
equipment  useage.  This  section  was  followed  by  a  task  list  covering  23  duties 
and  1045  tasks  to  assess  the  relative  time  spent  on  tasks  performed  by  the 
respondent.  The  task  list  was  followed  by  a  final  section  addressing  job 
expectations,  attitudes,  and  experience.  Responses  to  the  task  list  are 
undergoing  a  separate  analysis  which  will  be  reported  at  a  later  date. 
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The  job  expectations  and  experience  items  were  designee  to  identify  why 
the  incumbent  entered  the  career  field,  their  past  interest  and  experience  in 
the  mechanical  area,  their  expectations  about  the  job  before  entering  the 
career  field,  and  their  experiences  since  entering  the  career  field.  It  is 
important  to  know  if  men  and  women  enter  the  career  field  for  the  same  or 
different  reasons,  have  differing  expectations,  have  different  experiences  on 
the  job  and  how  these  factors  effect  such  things  as  attitudes  ana  intention  to 
reenlist.  The  first  items  in  the  questionnaire  were  concerned  with  the 
reasons  why  the  respondent  entered  the  Air  Force,  and  the  reasons  why  they 
entered  the  aircraft  maintenance  career  field.  Respondents  were  also  asked 
whether  or  not  aircraft  maintenance  was  their  first  career  field  choice  and 
what  their  career  plans  were  when  entering  the  Air  Force.  These  questions 
were  raised  by  a  study  conducted  in  1974  in  which  over  halt  of  the  women  in  a 
sample  of  technical  school  trainees  reported  they  had  chosen  aircraft 
maintenance  because  it  was  the  only  field  open  when  they  enlisted  (Longridge, 
1974).  Also  relating  to  attitudes  held  before  entering  the  career  field, 
respondents  were  asked  about  their  prior  experience  and  interest  in  the 
mechanical  area.  As  an  indication  of  current  interest  in  the  area,  they  were 
asked  if  they  would  try  to  find  a  job  in  a  mechanical  field  if  they  were  to 
leave  the  Air  Force. 

In  order  to  examine  the  expectations  held  by  men  and  women  before  they 
entered  the  career  field,  a  series  of  questions  were  asked  concerning  how 
their  work  compared  to  what  they  had  expected.  The  items  covered  technical 
difficulty,  physical  strength  requirements,  workload,  and  environment.  These 
same  areas  were  also  covered  by  questions  relating  to  how  the  respondents'  job 
had  changed  since  entering  the  career  field.  A  series  of  questions  concerning 
job  satisfaction  were  asked  followed  by  questions  about  the  amount  of 
assistance  required  on  their  job  and  specific  problems  with  technical  tasks 
and  physical  requirements.  The  final  items  in  the  questionnaire  concerned 
supervisor  attitudes  and  a  section  concerning  the  male/female  composition  of 
the  work  group  and  the  perceived  effectiveness  of  that  work  group. 

Data  Collection 


The  survey  battery  was  administered  in  the  field  to  100  percent  of  the 
women  in  the  career  field  who  were  available  to  complete  the  battery  and  two 
men  in  the  career  field  for  every  woman  surveyed.  Women  and  men  were  matched 
on  length  of  military  service  to  eliminate  any  systematic  bias  due  the  tact 
that  the  average  length  of  military  service  of  males  in  the  career  field,  as  a 
whole,  is  longer  than  the  average  length  of  military  service  of  the  v.oraen  in 
career  field. 

Data  Analysis 


The  responses  to  this  questionnaire  were  analyzed  to  identify 
statistically  significant  male/female  differences.  Most  items  had  options 
reflecting  an  underlying  continuum  of  possible  responses  and  significant 
gender  aifferences  among  these  items  were  identified  through  t  ratios  computed 
using  the  Bonferroni  technique  for  multiple  comparisons.  In  addition  to 
identifying  items  having  significant  gender  effects,  the  strength  of  these 
effects  was  also  evaluated  in  terms  of  the  amount  of  variance  items  accounted 
for  which  discriminated  between  genders  through  a  stepwise  discriminant 
analysis  algorithm.  For  those  few  items  having  categorical  response  options, 
a  chi-square  analysis  was  pertormed  to  identify  significant  gender  differences. 
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RESULTS 


An  overall  description  of  the  sample  obtained  for  analysis  is  shown  in 
Table  1.  Again,  the  intent  of  the  sampling  strategy  was  to  eliminate  any 
systematic  bias  resulting  from  differences  in  length  of  military  service 
between  males  and  females.  Although  there  are  some  minor  differences  between 
the  female  and  male  sub-samples,  overall,  the  men  and  women  appear  well 
matched.  The  only  difference  worth  noting  is  that  women  report  being  in  their 
job  a  shorter  length  of  time  than  the  men.  This  is  not  to  be  confused  with 
time  in  service  or  in  the  career  field. 

Table  1.  Sample  Characteristics 


Item 

Women 

(n*=797) 

Men 

(n=1594) 

Mean 

SD 

Mean 

SD 

1. 

Age 

24.70 

3.39 

23.76 

2.76 

2. 

Years  of  Education 

12.43 

.94 

12.20 

.70 

3. 

Months  of  Military  Service 

36.71 

20.07 

36.71 

20.07 

4. 

Military  Grade 

3.57 

.87 

3.50 

.81 

3. 

Months  in  Career  FieLd 

33.93 

20.09 

33.77 

19.62 

• 

Months  in  Job 

16.02 

12.47 

19.88 

13.62 

7. 

No.  Supervised 

.44 

1.25 

.72 

1.93 

In  Table  2  are  shown  the  results  of  the  chi-square  analyses  of  the 
questionnaire  items  having  categorical  response  options.  Statistically 
significant  (p  <  .001)  gender  differences  were  found  on  all  these  items.  As 
shown  in  Table  2,  the  reason  most  given  for  entering  the  aircraft  maintenance 
career  field  by  both  males  and  females  was  chat  it  was  the  area  of  strongest 
interest.  However,  differences  can  be  noted  in  all  response  categories. 
Similar  to  the  findings  of  the  Longridge  study,  females  were  more  likely  to 
have  been  persuaded  by  a  recruiter  or  to  have  entered  the  career  field  because 
it  was  the  only  one  open  to  them.  Differences  can  also  be  noted  in  the 
reasons  given  for  entering  the  Air  Force.  A  higher  percentage  of  males  than 
females  report  entering  the  AF  to  gain  training  for  use  in  civilian  life  or  to 
serve  their  country.  Females  were  more  likely  than  males  to  report  entering 
the  Air  Force  for  college  or  educational  benefits  or  for  adventure,  excitement 
and  travel.  When  asked  about  their  career  plans  upon  entering  the  career 
field,  a  larger  percentage  of  females  than  males  reported  that  aircraft 
maintenance  was  not  their  first  career  field  choice.  Similarly,  a  higher 
percentage  of  the  females  than  males  planned  to  crosstrain  out  of  the 
specialty.  A  higher  percentage  of  the  female  sample  attended  resident 
technical  Lraining  and  females  were  more  likely  to  work  the  day  shift  as 
opposed  to  other  shifts.  No  differences  were  found  as  to  whether  supervisors 
were  male  or  female,  military  or  civilian. 
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Table  2.  Chi-square  Analyses 


Item 


X  Females  X  Males 
(n=797)  (n=1594) 


1.  What  was  the  main  reason  you  entered  the 
aircraft  maintenance  (43XXX)  career  field? 

1.  Only  career  field  open  when  I  enlisted  25.41 

2.  Previous  civilian  experience  in  the  field  .51 

3.  Training  for  use  in  civilian  life  3.68 

4.  Area  of  strongest  interest  33.55 

5.  Persuaded  by  recruiter  28.97 

6.  Other  7.88 

2.  What  was  the  main  reason  you  enlisted  in 
the  Air  Force? 

1.  Training  for  use  in  civilian  life  10.76 

2.  College/educational  benefits  31.63 

3.  Adventure/exc itement/ travel  25.33 

4.  Influenced  by  someone  in  the  military  6.79 

5.  To  get  away  from  a  personal  problem  5.25 

6.  Couldn't  find  any  other  job  5.89 

7.  To  serve  my  country  6.40 

8.  Can  earn  more  money  in  the  Air  Force  than  1.92 

as  a  civilian 

9.  Other  6.02 

3-  In  your  present  job,  which  of  the  following 

most  closely  describes  the  schedule  you  normally  work? 

1.  Day  shift  (such  as  0800-1600)  55.96 

2.  Other  44.04 

4.  What  were  your  career  plans  when  you  entered  the 
aircraft  maintenance  (43XXX)  career  field? 

1.  I  planned  to  stay  in  the  43XXX  career  field  for  49.43 
at  least  my  first-term  of  enlistment 

2.  1  planned  to  cross  train  out  as  soon  as  20.88 

possible 

3.  I  had  no  plans  29.69 

5.  Was  aircraft  maintenance  (43XXX)  your  first  career 
field  choice? 

1.  Yes  60.41 

2.  No  39.59 


13.17 

2.49 

15.66 

47.19 

17.20 
4.28 


28.63 

21.60 

16.61 

6.45 

4.60 

6.84 

10.93 

.32 

4.03 


41.61 

58.39 


57.77 

11.74 


30-49 


71.46 

28.54 


Chi-Square 

190.69* 


147.43* 


43.40* 


36.67* 


28.61* 
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Table  2.  Chi-square  Analyses  (Cont'd) 


X  Females 

%  Males 

Item 

(  n=  7  9  7  ) 

(n=1594) 

Chi-Square 

6. 

How  were  you  assigned  to  your  present  career 
ladder?  Blacken  only  one  circle  on  this  line. 

1.  Completed  resident  technical  training 

80.03 

72.22 

25.69* 

2.  Reclassified  without  completing  technical 

.91 

1.54 

training  or  OJT 

3.  Directed  duty  assignment  (DDA)  from  basic 

3.24 

6.82 

training  to  OJT  without  bypass  test 

4.  DDA  from  basic  training  by  bypass  test 

.26 

.96 

5.  Converted  from  another  AF  specialty  without 

.52 

1.35 

training  by  classification  board  action 

6.  Retrained  from  another  specialty 

1.04 

1.16 

7.  Reenlisted  after  prior  service  in  USAF  or 

1.17 

1.54 

from  another  branch  of  service 

8.  Not  assigned  to  my  career  ladder  by  any  of 

12.82 

14.41 

the  above  methods 

7. 

Is  your  immediate  supervisor  male  or  female? 

I.  Male 

98.11 

98.54 

-37 

2.  Female 

1.89 

1.46 

8. 

Is  your  immediate  supervisor  military  or  civilian? 
1.  Civilian 

3.91 

4.70 

.60 

2.  Military 

96.09 

95.30 

♦Significant  at  .01  level. 


A  total  of  54  questionnaire  and  background  items  with  continuous  response 
options  were  analysed  for  male/female  differences.  T-values  were  computed  to 
identify  significant  differences  between  male  and  female  sample  means  and  a 
Bonferroni  adjustment  for  multiple  comparisons  was  applied  in  the  test  for 
significance.  As  shown  in  Table  3,  25  items  were  found  to  be  significant  at 
Che  .Oi  level. 

To  determine  practical  significance,  the  same  54  items  were  entered  into  a 
stepwise  discriminant  function  analysis  with  gender  as  the  dependent 

variable.  A  total  of  13  variables  were  identified  as  significantly 
differentiating  between  men  and  women  at  the  .001  level,  and  accounted  for  38% 
of  the  variance  discriminating  between  genders.  The  first  six  variables 

entering  into  the  discriminant  function  accounted  for  34%  of  the  total  gender 
variance.  However,  it  was  found  that  variables  entering  the  stepwise 
algorithm  after  the  sixth  item  each  accounted  for  less  than  12  of  the  gender 
variance.  Therefore,  it  was  felt  that  only  the  first  six  variables  merited 
discussion. 

As  shown  in  Table  4,  the  item  asking  "If  you  left  the  Air  Force,  would  you 
try  to  find  a  job  in  the  mechanical  area?"  was  the  best  single  discriminator 
between  males  and  females  accounting  for  approximately  182  of  the  gender 
variance.  Females  were  found  significantly  less  likely  to  attempt  to  find  a 
mechanical  job  in  the  civilian  sector  after  separation  than  were  the  males. 
Other  significant  discriminators  indicated  that  women  reported  less  prior 
experience  than  males  in  the  mechanical  area  and  females  found  that  their  jobs 
required  somewhat  more  physical  strength  than  they  expected,  whereas  males 

reported  their  jobs  required  slightly  less  physical  strength  than  expected. 
Women  reported  working  in  duty  sections  with  a  slightly  higher  proportion  of 
females  than  do  males  and  were  less  inclined  to  leave  the  Air  Force  than  were 
the  males.  When  asked  how  much  confidence  their  supervisors  had  in  them 
initially,  women  reported  perceiving  less  confidence  than  males.  It  is 
interesting  to  note  that  when  asked  how  much  confidence  their  supervisors  have 
in  them  now,  there  was  virtually  no  difference  in  the  amount  of  confidence 
males  and  females  reported  perceiving.  It  is  also  of  interest  to  note  that 
that  on  many  items  in  which  differences  might  have  been  expected,  such  as  the 
amount  of  assistance  required,  and  in  amount  of  TDY ,  meaningful  gender 

differences  were  not  found.  As  in  the  Bergmann  ard  Christal  study, 
differences  in  job  attitudes  were  not  found  to  nave  practical  significance. 

CONCLUSIONS 

Overall,  the  analysis  of  the  job  expectations,  experiences,  and  attitude 
data  indicate  that  some  significant  differences  do  exist  between  males  and 
females  in  their  reasons  for  entering  the  Air  Force  and  the  Aircraft 
Maintenance  career  field,  in  their  previous  mechanical  experience  and  their 
plans  for  civilian  work. 

Differences  were  also  indicated  in  the  area  of  expectations  about  the 
amount  of  strength  required  on  the  job,  initial  supervisor  confidence  and 
desire  to  leave  the  Air  Force.  However,  differences  were  not  found  to  be 
gender  specific  in  other  areas  of  expectations,  experiences  and  attitudes. 
Overall,  satisfaction  with  the  Air  Force  and  their  jobs,  current  supervisory 
confidence,  job  difficulties  and  job  changes  were  not  found  to  have 
discriminative  gender  significance. 
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Tatle  3.  Questionnaire  Items:  Summary  Statistics 


Females" {n=797V  Hales  (naI594) 


Item 


Mean 


SD 


Mean 


SD  t  ratio0 


1.  Present  Grade 

1.  El-AB 

2.  E2-AMN 
3-  E3-AIC 

4.  E4-Sgt 

5.  E5-SSgt 


3.57 


6.  E6-TSgt 

7.  E7-MSgt 

8.  E8-SMSgt 

9.  E9-CHSgt 


.87 


3.50 


.81 


1.88 


2. 

Total  Time  in  Career  Field  (months) 

33.93 

20.09 

33.77 

19.62 

.19 

3. 

Time  in  Present  Job  (months) 

16.02 

12.47 

19.88 

13.62 

-6.79* 

4. 

Circle  the  highest  school  grade  or  college/ 

12.43 

.94 

12.20 

.70 

6.16* 

university  year  completed  (include  equal  level, 
like  GED,  but  not  special  training,  like 
vocational,  outside  regular  school) 


01 

02 

03 

04 

05 

06 


07 

08 

09 

10 

11 

12 


13 

14 

15 

16 

17 

18 


In  your  present  job,  which  of  the  following 
most  closely  describes  the  schedule  you 
normally  work? 

1.  Day  shift  (such  as  0800-1600) 

2.  Other 


1.44 


.49 


1.58 


.49 


6. 


7. 


For  how  many  airmen  and  civilian  are  you  the 
immediate  supervisor?  (include  only  those  who 
report  directly  to  you) 


Over  the  last  six  months,  how  many  days  have  you 
been  TDY? 

6.  91-120  Days 

7.  121-150  Days 

8.  151-180  Days 

9.  181  Days  or  more 


.442 


1.48 


1.25 


.98 


1 .  None 

2.  1-14  Days 

3.  15-30  Days 

4.  31-60  Days 

5.  61-90  Days 


8. 


Over  the  last  year,  how  many  times  have  you 
been  TDY? 


1. 

None 

4. 

11-15 

Times 

7.  26-30  Times 

2. 

1-5  Times 

5. 

16-20 

Times 

8.  31-35  Times 

3. 

6-10  Times 

6. 

21-25 

Times 

9.  36  Times  or 

more 

1.50 


.89 


.719  1.93  -3.64 


1.79  1.34  -6.36* 


1.61  .94  -2.95 
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Table  3.  Questionnaire  Items:  Sumsio  Statistics  ICont’a) 


Item 


females  ln=/97>  Hales  Cn=i594; 
Hean  _  SD  Mean  SI* 


4.  Was  aircraft  maintenance  (43XXX)  your  first  1.40  .49  1.28  .45 

career  field  choice? 

1.  Yes 

2.  No 

10.  How  similar  is  your  assigned  career  field  3.07  1.84  2.38  1.63 

to  your  preferred  career  field? 

1.  I  was  assigned  to  my  preferred  career  field. 

2.  Very  similar 

3.  Somewhat  similar 

4.  Not  very  similar 

5.  Not  at  all  similar 

11.  How  difficult  was  aircraft  maintenance  technical  2.84  .84  3.24  .69 

training  for  you? 

1-  Very  difficult 

2.  Somewhat  difficult 

3.  Fiirly  easy 

4.  Very  easy 

12.  Before  you  entered  aircraft  maintenance  3.09  .91  2.13  1.01 

technical  training,  how  much  mechanical 

experience  did  you  have.? 

1.  Considerable  experience 

2.  Some  experience 

3.  Little  experience 

4.  No  experience 

13.  Before  entering  aircraft  maintenance  2.32  1-04  1.50  .75 

technical  training,  how  much  interest  did 

you  have  in  mechanics? 

1.  Considerable  interest 

2.  Some  interest 

3.  Little  interest 

4.  No  interest 


14.  If  you  left  the  Air  Force,  would  you  try  to  3.57  1.29  2.26  1.20 

find  a  job  in  the  mechanical  area? 

1.  Definitely 

2.  Probably 

3.  Not  sure 

4.  Probably  not 

5.  Definitely  not 


15. 


How  uu  you  find  your  job 

1.  Extremely  Dull 

2.  Very  Dull 

3 .  Fairly  Dull 

4.  So-So 


5-  Fairly  Interesting 
b.  Very  Inueresting 
7.  Extremely  Interesting 


1-30  4.63  l.oi 
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Table  3.  Questionnaire  Items:  Summary  Statistics  (Coat'd) 


~  -  Females  (n-797)  Hales  (n"1594) 

Item _  Mean _ SD  Mean _ SD  t  ratio* 

16.  Is  your  work  more  or  less  difficult  technically  3.46  1.14  3-78  1.05  -6.56* 

than  you  expected  before  entering  the  431X1/X2 

AFSC? 

1.  Much  more  difficult 

2.  Slightly  more  difficult 

3.  It  is  what  I  expected 

4.  Slightly  less  difficult 

5.  Much  less  difficult 

17.  Do  you  do  more  or  less  work  where  you  get  dirty  2-93  1.15  2.90  1.08  .73 

than  you  expected  before  entering  the  431X1/X2 

AFSC: 

1.  Much  more 

2.  Slightly  more 

3.  It  is  what  I  expected 

4.  Slightly  less 

5.  Much  less 

18.  Does  your  work  require  more  or  less  physical  2.66  1.14  3.22  .94  -11.94* 

strength  than  you  expected  before  entering  the 

431X1/X2  AFSC? 

1.  Much  more  strength 

2.  Slightly  more  strength 

3.  It  is  what  I  expected 

4.  Slightly  less  strength 
5-  Much  less  strength 

19.  Is  your  workload  heavier  or  lighter  than  you  2.95  1.12  3.04  1.14  -1.86 

expected  before  entering  the  431X1/X2  AFSC? 

1.  Much  heavier 

2.  Slightly  heavier 

3.  It  is  what  I  expected 

4.  Slightly  lighter 

5.  Much  lighter 

20.  Do  you  spend  more  or  less  time  working  under  2.44  1.18  2.34  1.13  1.84 

unfavorable  conditions  (heat,  cold,  bad  weather, 

etc.)  than  you  expected  before  entering  the 
431X1/X2  AFSC? 

1.  Much  more 

2.  Slightly  more 

3.  It  is  what  I  expected 

4.  Slightly  less 

5.  Much  less 


Table  3.  Questionnaire  Items:  Summery  Statistics  (tout'd) 


I  tem 


Females  (n=79  7  )  Males  (n-i.394; 

Mean  SD  Mean  SD  t  ratio3 


26.  Is  your  workload  now  heavier  or  lighter  2.77  i.28  2.60  1.17  3.28 

than  when  you  first  began  working  in  the 

431X1/X2  AFSC? 

1.  Much  heavier 

2.  Slightly  heavier 

3.  No  change 

4.  Slightly  lighter 

5.  Much  lighter 

27.  Do  you  now  spend  more  or  less  time  working  3.24  1.31  2.82  1.11  7.70* 

under  unfavorable  conditions  (heat,  cold, 

bad  weather,  etc.)  than  when  you  first  began 
working  in  the  431X1/X2  AFSC? 

1.  Much  more 

2.  Slightly  more 

3.  No  change 

4.  Slightly  less 
5-  Much  less 


Are  you  satisfied  with  the  Air  Force  in 

1.  Very  satisfied 

2.  Fairly  satisfied 

3.  Neither  satisfied  or  dissatisfied 

4.  Somewhat  dissatisfied 

5.  Very  dissatisfied 

general? 

2.86 

1.25 

3.15 

1.26 

-5.36* 

How  satisfied  are  you  with  your  present 
Force  job? 

1.  Very  satisfied 

2.  Fairly  satisfied 

3.  Neith  satisfied  or  dissatisfied 

4.  Somewhat  dissatisfied 

5.  Very  dissatisfied 

Air 

3.06 

1.37 

2-90 

1.33 

2.61 

How  much  useful  skill  and  experience  are  you 
gaining  from  your  job? 

1.  A  very  large  amount 

2.  A  large  amount 

3 .  Some 

4.  Very  little 

5.  None 

3.11 

1.14 

2.78 

1.14 

6.66* 

How  does  your  job  utilize  your  talents? 

3.15 

1.34 

3.37 

1.43 

-3.68 

1.  Not  at  all 

5. 

Very  well 

2.  Very  little 

6. 

Excellently 

3.  Fairly  well 

4.  Quite  well 

7. 

Perfectly 
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Table  3.  Questionnaire  Items:  Summary  Statistics  (Cont'd) 


Item 


Females  ^n=7973  Males  (r,-1594) 
Mean  Si!  Mean  SD 


32.  How  does  your  job  utilize  your  training?  3.33 

1.  Not  at  all  5.  Very  well 

2.  Very  little  6.  Excellently 

3-  Fairly  well  7.  Perfectly 

4.  Quite  well 

33.  How  satisfied  are  you  with  the  sense  of  4.43 

accomplishment  you  gain  from  your  work.? 

1.  Extremely  Dissatisfied  5.  Slightly  Satisfied 

2.  Very  Dissatisfied  6.  Very  Satisfied 

3.  Slightly  Dissatisfied  7.  Extremely  Satisfied 

4.  Neither  Satisfied  Nor  Dissatisfied 


34.  If  you  had  the  chance,  would  you  change  to 
another  career  field? 

1.  Definitely  would 

2.  Probably  would 

3.  Not  sure 

4.  Probably  would  not 

5.  Definitely  would  not 

35.  If  you  had  the  chance,  would  you  leave  the 
Air  Force? 


1. 

Def ini tely 

wou  Id 

2. 

Probably  would 

3. 

Not  sure 

4. 

Probably  would  not 

5. 

Def  initely 

would  not 

Do 

you  plan  to  reenlist? 

1. 

No 

2. 

Uncertain, 

Probably  No 

3. 

Uncertain, 

Probably  Ye 

4. 

Yes 

2.18 


2.97 


2.32 


37.  How  many  ot  the  task*  in  your  job  can  you  1.88 

perform  without  technical  assistance  from 
another  person? 

1.  A  very  large  number 

2.  A  large  number 

3.  Some 

4.  Very  few 

5.  None 

38-  How  often  do  cownrkers  volunteer  to  give  2.76 

you  technical  assistance? 

1 .  Very  often 

2 .  Often 

J .  Somet imes 

4.  Karel y 

5.  Never 


1.44  3.66  1.45 


1.72  4.54  1.64 


1.24  2.39  1.27 


1.27  2.65  1.26 


1.10  2.25  1.06 


.819  1.78  .78 


1.07  2.78  1.02 
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Table  3.  Questionnaire  Items:  Summary  Statistics  (Coat'd) 


Females  (n»797)  Hales  (it»1594) 

Item _ Mean _ SD _ Mean _ SD  t  ratio* 

39.  How  often  do  you  have  to  ask  for  technical  3.25  .75  3.26  .78  -.29 

assistance? 

1.  Very  often 

2.  Often 

3.  Sometimes 

4.  Rarely 

5.  Never 

40.  How  many  of  the  physical  tasks  (lifting,  2.40  .87  2.26  .74  3 . 981<f 

moving  equipment,  reaching,  etc.)  in  your 

specialty  can  you  perform  without  assistance 
from  another  person? 

1.  All 

2.  Most 

3 .  Some 

4.  Few 

5 .  None 

41.  How  often  do  coworkers  volunteer  to  help  you  2.52  1.05  2.51  .99  .27 

with  physical  tasks  (lifting,  moving  equipment, 

reaching,  etc.)? 

1.  Very  often 

2.  Often 

3.  Sometimes 

4.  Rarely 

5.  Never 

42.  How  often  do  you  have  to  ask  for  help  doing  3.27  .83  3.22  .84  1.34 

physical  tasks?  (Lifting,  moving  equipment, 

reaching,  etc.) 

1.  Very  often 

2.  Often 

3.  Sometimes 

4.  Rarely 

5.  Never 

43.  How  many  of  the  tools  required  in  your  job  3.90  .91  3.76  .97  3.12 

are  difficult  to  use  because  of  their  size 

or  bulkiness? 

1.  A  very  large  number 

2.  A  large  number 

3 .  Some 

4.  Very  few 

5.  None 

6.  Not  applicable 
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Table  3.  Questionnaire  Items:  Summary  Statistics  (Cont'd) 


Item 


Fema  les  (n=797l)  Hales  (nc1594) 

Mean  SD  Mean  SI)  t  r. 


44.  How  many  of  the  tools  required  in  your  job  are  4.04  .85  4.35  .76  -1 

difficult  to  use  because  they  require  too  much 

strength? 

1.  A  very  large  number 

2.  A  large  number 

3.  Some 

4.  Very  few 
5*  None 

6.  Not  applicable 

45.  How  many  of  the  tools  required  in  your  job  are  4.50  .70  4.53  .69 

difficult  to  use  because  it  is  hard  to  under¬ 
stand  how  to  operate  them? 


1. 

A  very  large  number 

2. 

A  large  number 

3. 

Some 

4. 

Very  few 

5. 

None 

6. 

Not  applicable 

46. 

Is 

your  immediate  supervisor 

male  or  female? 

1.02 

.14 

1.01 

.12 

1. 

Male 

2. 

F  ema 1 e 

47. 

Is 

your  immediate  supervisor 

military  or 

1.96 

.19 

1.95 

.21 

civilian? 

1.  Civilian 

2.  Military 

48.  How  does  your  supervisor  judge  your  work?  2.97  .62  2.92  .68 

1.  Very  leniently 

2.  Somewhat  leniently 
2.  In  a  fair  way 

4.  Somewhat  harshly 

5.  Very  harshly 

49.  How  often  does  your  immediate  supervisor  give  2.96  1.12  3.08  1.11 

you  recognition  for  a  job  well  done? 

1.  Very  often 

2.  Often 

3.  Sometimes 

4.  Rarely 

5.  Never 
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Table  3.  Questionnaire  Items:  Summary  Statistics  (Cont'd) 


Females 

assn 

Males 

tn= 1594) 

Item 

Mean 

SD 

Mean 

SD  t  ratio3 

50.  When  you  first 
confidence  did 

began  this  assignment,  how  much  2.86 

your  supervisor  have  that  you 

1.09 

2.45 

.92  8.30* 

could  do  your  job  well,  compared  to  your  fellow 
workers  with  the  same  amount  of  experience? 

1.  A  very  large  amount 

2.  A  large  amount 

3 .  Some 

4.  Very  little 

5 .  None 

6.  Don't  know 

51.  How  much  confidence  does  your  supervisor  have  1.96  .83  1.96  .31  .03 

that  you  can  do  your  job  well  now,  compared  to 

your  fellow  workers  with  the  same  amount  of 
experience? 

1.  A  very  large  amount 

2.  A  large  amount 

3 .  Some 

4.  Very  little 

5.  None 

6.  Don't  know 

52.  How  would  you  rate  the  effectiveness  of  your  2.68  1.02  2.56  1.11  2.53 

duty  section  compared  to  other  43XXX  duty 

sections? 

1.  Much  more  effective 

2.  More  effective 

3.  About  the  same 

4.  Slightly  leas  effective 

5.  Much  less  effective 

53.  Approximately  how  many  431X1/X2  personnel  are  5.68  3.08  6.17  2.75  -3.75* 

in  your  duty  section? 

1.  0-5  6.  26-30 

2.  6-10  7.  31-35 

3.  11-15  8.  36-40 

4.  16-20  9.  41  or  more 

5.  21-25 

54.  Approximately  what  is  the  proportion  of  female  2.39  1.14  1.94  .86  9.92* 

and  male  431X1/X2  personnel  in  your  duty  section? 

1.  0%  female/ 1002  male 

2.  52  female/952  malle 

3.  102  female/902  male 

4.  152  female/852  male 

5.  202  female/802  male 

6.  302  female/702  male 

7.  402  female/602  male 

8.  502  or  more  female/502  or  less  male 


aBonferroni  Tcrit  =  3.75,  p  =  .01 

^These  items  are  not  amenable  to  T-test  comparisons.  See  Table  2. 

*p  <.  .01 


Table  4.  Summary  of  Significant  Gender  Discriminators 


Item 

Females 

Mean 

(n=521) 

3D 

Males  ( 
Mean 

;n»1137) 

SD 

Unique 

Contributioi 

1.  If  you  left  the  Air  Force,  would  you 

try  to  find  a  job  in  the  mechanical  area? 

1.  Definitely 

2.  Probably 

3.  Not  sure 

4.  Probably  not 

5.  Definitely  not 

3.57 

1.29 

2.26 

1.20 

182 

2.  Before  you  entered  aircraft  maintenance 
technical  training,  how  much  mechanical 
experience  did  you  have? 

1.  Considerable  experience 

2.  Some  experience 

3.  Little  experience 

4.  No  experience 

3.09 

.91 

2.13 

1.01 

72 

3.  Does  your  work  require  more  or  less 

2.66 

1.14 

3.22 

.94 

42 

physical  strength  than  you  expected 
before  entering  the  431X1/X2  AFSC? 

1.  Much  more  strength 

2.  Slightly  more  strength 

3.  It  is  what  I  expected 

4.  Slightly  less  strength 
5-  Much  less  strength 

4.  If  you  had  the  chance,  would  you  2.39  1.14  1.94  .86  22 

leave  the  Air  Force? 

1.  Definitely  would 

2.  Probably  would 

3.  Not  sure 

4.  Probably  would  not 

5.  Definitely  would  not 

5.  Approximately  what  is  the  proportion  2.97  1.27  2.65  1.26  22 

of  female  and  male  431X1/X2  personnel 

in  your  duty  section? 

1.  0%  female/ 1002  male 

2.  52  female/ 95%  male 

3.  102  female/902  male 

4-  15%  female/852  male 

5.  20%  female/802  male 

6.  302  female/702  male 

7.  402  female/602  male 

8.  50%  or  more  female/50%  or  less  male 

6.  When  you  first  began  this  assignment,  2.86  1.09  2.45  .92  12 

how  much  confidence  did  your  supervisor 
have  that  you  could  do  your  job  well, 
compared  to  your  fellow  workers  with 
the  same  amount  of  experience? 

1.  A  very  large  amount 

2.  A  large  amount 

3 .  Some 

4.  Very  little 

5 .  None 

6 .  Don ' t  know 
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Perhaps  the  information  to  come  from  these  data  which  should  be  of  most 
interest  to  Air  Force  managers  is  that  a  greater  percentage  (54.382)  of  the 
women  than  men  (30.372)  in  this  study  entered  the  aircraft  maintenance  career 
field  because  it  was  the  only  area  open  or  they  were  persuaded  by  the 
recruiter.  Similarly,  approximately  21%  of  the  women  compared  to  12%  of  the 
men  planned  to  crosstrain  out  at  the  time  of  entry.  While  these  differences 
in  reason  for  entering  the  career  field  and  career  plans  have  no  corresponding 
difference  in  job  attitudes  and  reenlistment  intent,  it  is  not  yet  known  the 
impact  of  these  differences  in  other  areas  such  as  crosstraining  rates  and 
performance  levels. 

It  is  also  of  interest  that  the  women  in  the  sample  reported  perceiving 
less  confidence  initially  from  their  supervisors  than  the  men.  This,  as  well 
as  the  women  expecting  the  job  to  require  less  strength  than  it  did,  could 
result  in  a  longer  time  being  required  for  the  women  to  adjust  to  the  job 
initially.  These  misconceptions  on  the  part  of  the  supervisors  as  well  as 
those  entering  the  career  field  could  be  overcome  through  the  dissemenation  of 
more  accurate  information  and  efforts  to  dispel  predjudiee.  As  women  report 
less  interest  in  leaving  the  Air  Force  than  do  men,  efforts  to  improve  their 
utilization  and  adjustment  into  the  career  field  should  be  worthwhile. 

Further  analysis  of  these  data  will  be  conducted  using  the  results  of  the 
currently  ongoing  job  analysis.  Males  and  females  car.  then  be  grouped  by  job 
type  and  differences  between  males  and  females  in  the  same  job  type  and  across 
job  types  can  be  analyzed  to  investigate  the  effect  of  the  type  of  work 
performed  upon  these  areas  of  interest.  This,  along  with  the  investigation  of 
the  input  and  exit  populations,  utilization  patterns,  and  on-the-job 
performance  levels,  will  give  an  in-depth  view  of  the  career  field  and  the 
women  and  men  working  in  it. 
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Utility  of  the  Field  Dependence  Dimension  in  Military  Settings 

X 

'  The  field  dependence  -  field  Independence  dimension  was  defined 
and  its  relationships  with  various  personality  dimensions,  social 
behaviors,  and  problem-solving  styles  were  identified.  In  the  military 
context,  the  utility  of  the  field  dependence  construct  and  of  various 
measures  of  field  dependence  were  critically  reviewed.  Empirical 
studies  which  used  military  samples  or  which  examined  areas  directly 
related  to  the  military,  for  example,  map  reading  skills,  target 
detection,  and  computer  skills  were  examined.  Then  conceptual  and 
methodological  issues  surrounding  the  field  dependence  construct  were 
discussed.  These  Issues  focused  on  the  relationships  between  field 
dependence  and  intelligence,  sex  differences  in  field  dependence, 
relationships  among  various  measures  of  field  dependence,  components 
underlying  performance  on  field  dependence  tests,  and  design  problems 
with  much  of  the  research  in  this  area.  Finally,  a  set  of  guidelines 
was  presented  for  the  future  use  of  the  construct  and  its  measures  in 
military  settings.  These  guidelines  focused  on,  among  other  points, 
the  importance  of  selecting  appropriate  measures  of  the  construct,  the 
use  of  multivariate  as  well  as  univariate  designs,  and  controls  for 
various  potential  sources  of  confounds.  > 


739 


PREVIOUS  PAGE 
IS  BLANK 


<D 


s 


UTILITY  OF  THE  FIELD  DEPENDENCE  DIMENSION 
IN  MILITARY  SETTINGS 

Robert  Loo  Ph.D. 

Canadian  Forces  Personnel  Applied  Research  Unit 
4900  Yonge  Street 
Willovdale,  Ontario 


INTRODUCTION 

Beginning  with  visual  perception  studies  using  the  Rod- and- frame  Test, 
Witkin  and  his  associates  (e.g.,  Witkin,  Lewis,  Hertzman,  Machover,  Meissner  & 
Wapner,  1954)  found  that  subjects  could  be  differentiated  along  a  continuum 
based  on  their  ability  to  separate  a  given  "figure"  from  "ground"  in  the 
visual  field.  They  labelled  this  continuum  perceptual  field-dependence- 
independence.  Field  dependence  (FD)  can  be  described  as  a  "fused"  or  "global" 
way  of  perceiving;  perception  is  dominated  by  the  overall  organization  of  the 
(visual)  field.  Field  independence  (FI),  on  the  other  hand,  can  be  described 
as  an  articulated  way  of  perceiving;  parts  of  the  field  are  perceived  as 
discrete  from  the  organized  background. 

In  the  30-odd  years  since  its  introduction  into  the  scientific 
literature,  the  field  dependence  dimension  has  been  extensively  examined. 
Before  his  unfortunate  death  in  1979,  Witkin  (Witkin  &  Goodenough,  1977; 
Witkin,  Goodenough  &  Oltman,  1979)  published  two  papers  which  are  excellent 
overviews  on  the  status  of  the  field  dependence  construct.  In  one  paper, 
Witkin  &  Goodenough  (1977)  presented  a  comprehensive  examination  of  the 
relationships  between  field  dependence  and  interpersonal  behavior. 

Essentially,  they  reported  that  FD  persons  have  a  strong  interpersonal 
orientation  in  terms  of  being  emotionally  open,  gravitating  toward  social 
situations,  and  prefering  to  be  physically  close  to  others;  whereas,  FI 
persons  have  an  impersonal  orientation  in  terms  of  not  being  very  interested 
in  others,  showing  both  physical  and  psychological  distancing  from  people,  and 
showing  a  preference  for  nonsocial  situations.  In  the  other  paper.,  Witkin 
(Witkin  et  al.,  1979)  updated  the  status  of  his  psychological  differentiation 
construct,  a  high-ordar  construct  of  which  field  dependence  is  the  most 
studied  lower-order  construct.  An  important  focus  of  this  paper,  as  compared 
with  Witkin's  previous  overviews,  was  the  attention  given  to 
neurophysiological  research  published  during  the  1970's  on  the  relationships 
between  field  dependence  and,  say,  lateral  specialization  of  the  hemispheres 
or  patterns  of  electroencephalogram  (EEG)  recordings. 

MEASURES  OF  FD 

The  pioneering  research  used  the  Rod-and-Frame  Test  (RFT),  a  test  in 
which  the  subject,  while  seated  in  a  darkened  room,  must  adjust  to  the  upright 
a  tilted  luminous  tod  centered  within  a  tilted  luminous  frame.  Since  those 
pioneering  days,  Witkin  and  his  associates  have  developed  several  other  tests 
of  field  dependence.  Several  tests  are  variations  of  the  RFT;  for  example, 
the  portable  RFT,  the  Ti 1 t ing-Room-Ti 1 t ing-Lha lr  Test,  the  Room-Adjustment 
Test,  and  the  Body-Adjustment  Test.  However,  another  line  of  test  development 
produced  the  Embedded  Figures  Test  (EFT)  and  a  group-administered  version  of 
the  EFT.  These  embedded-figures  tests  are  more  practical  in  the  sense  that 
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they  do  not  require  cumbersome  equipment,  special  rooms  or  periods  of  dark 
adaptation*  In  addition,  the  Group  Embedded  Figures  Test  (GEFT)  permits  group 
testing  whereas  all  the  other  tests  mentioned  require  individual  testing,  a 
time  consuming  and  costly  way  of  gathering  data. 

Numerous  studies  have  examined  the  interrelationships  among  the  various 
measures  of  field  dependence.  Avbuthnot  (1972)  reviewed  40  such  studies  which 
reported  a  total  of  122  correlations  between  field  dependence  measures.  He 
noted,  for  example,  that  the  mean  correlation  was  0.54  for  the  21  correlations 
reported  between  scores  on  the  RFT  and  Witkin's  original  24-item  EFT. 

FD  AND  OTHER  VARIABLES 

There  has  been  much  interest  in  relating  field  dependence  to  variables 
in  the  personality  and  attitudes  domains  among  others.  Table  1  identifies 
some  findings  reported  in  the  literature  over  the  past  25  years.  One  caveat 
in  examining  the  table  is  that  for  each  area  there  are  studies  which  have 
failed  to  obtain  the  described  relationships  and  even  studies  which  have 
obtained  contradictory  results.  However,  there  appears  to  be  sufficient 
evidence  to  support  the  descriptions  in  Tabla  1.  The  two  papers  by  Uitkin 
(Witkin  &  Goodenough,  1977;  Witkin  et  al.,  1979)  provide  references  for  these 
various  relationships. 

Table  2  presents  some  applications  of  field  dependence  which  are 
military  based  or  directly  applicable  to  the  military.  Fine  (e.g.,  1980),  at 
the  U.S.  Army  Research  Institute  of  Environmental  Medicine,  has  demonstrated 
the  utility  of  the  field  dependence  construct  in  studying  (visual)  sensation 
and  perception  in  a  military  context  where  U.S.  soldiers  served  as 
participants.  Fielu  dependence  has  also  been  used  as  one  variable  in 
describing  human  operator  performance  in  man-machine  interfaces;  for  example, 
Howard  (1979)  has  shown  the  usefulness  of  field  dependence  in  predicting 
console  operator  performance  on  the  Patriot  weapon  system.  Moving  from 
performance  in  man-machine  systems  to  performance  "in  the  field",  Belletti  and 
Anthony  (1979)  have  shown  that  field  independent  persons  on  an  artillery 
forward  observers  course  performed  better  than  field  dependent  persons. 

Studies  have  also  shown  the  utility  of  field  dependence  in  examining  pilot 
performance  (e.g.,  Cullen,  1969;  Kennedy,  1972),  vehicle  driver  performance 
(e.g..  Loo,  1978a),  and  performance  in  vehicle  simulators  (e.g.,  Barrett 
et  al. ,  1969). 

Stasz  and  Thorndyke  (1980)  have  shown  that  field  dependence  is  related 
to  map  reading  skills  where  such  skills  are  relevant  to  many  areas  in  the 
military.  Loo  (1981)  has  suggested  that  field  dependence  may  be  a  useful 
construct  in  selecting  and  in  understanding  performance  in  tactical  command, 
control,  and  communication  (C^)  systems.  Finally,  work  by  Fine  (1972)  with 
military  samples  and  by  Loo  (1978b)  with  civilian  samples  has  shown 
relationships  between  field  dependence  and  symptoms  of  psychopathology , 
namely,  neurotic  ism. 


CRITIQUE  OF  FIELD  DEPENDENCE 

Many  have  taken  up  the  field  dependence  construct  in  their  research 
since  Witkin's  early  work;  however,  several  researchers  (e.g.,  Arbuthnot, 


1972;  Vernon,  1972;  Wachtel,  1972)  have  expressed  concern  over  the  validity  oi 
sense  claims  made  by  Vitkin  and  over  the  methodology  used  in  many  studies  in 
this  area. 


Performance  Components  of  FD  Measures 

It  is  suggested  that,  on  the  basis  of  determinants  of  performance, 
typical  measures  of  field  dependence  can  be  clustered  into  two  groups, 
embedded-figures  tests  and  adjustment  test  such  as  the  Rod-and-Frame  Test. This 
position  differs  from  Witkin  who  would  claim  that  these  tests  have  high 
convergent  validity  in  the  measurement  of  field  dependence. 

Perhaps  the  earliest  evidence  for  the  two-clusters  position  espoused  in 
the  present  paper  can  be  found  in  Goodenough  and  Karp's  (1961)  factor-analytic 
study.  Although  they  were  concerned  with  the  relationship  between  field 
dependence  and  intellectual  functioning,  their  results  can  be  interpreted  to 
suggest  that  the  various  field-dependence  variables  they  used  do  not  clearly 
identify  a  homogeneous  clustering  of  field-dependence  measures.  They 
performed  two  oblique  factor  analyses,  one  on  data  from  a  group  of  25  boys  and 
25  girls  (Group  A)  and  the  other  on  data  from  a  group  of  30  boys  (Group  B). 
Factor  loadings  in  the  solution  for  Group  A  showed  that  no  one  factor 
identified  all  eight  field-dependence  variables.  However,  loadings  for  the 
even  smaller  Group  B  showed  that  Factor  III  identified  all  five 
field-dependence  variables  used  with  that  group.  In  both  oblique  solutions, 
each  of  the  various  field-dependence  variables  typically  loaded  on  more  than 
one  factor.  Unfortunately,  the  authors  did  not  report  the  correlations  among 
the  primary  factors  or  second-order  analyses.  Obviously,  oblique  primaries 
indicated  that  higher-order  analyses  should  have  been  performed,  and  if  they 
had  been,  then  the  results  might  have  been  more  meaningful  both  for  their 
purposes  and  for  those  of  the  present  paper. 

A  second  more  comprehensive  and  direct  source  of  evidence  for  the 
position  is  found  in  Vernon's  (1972)  large-scale  factor-analytic  study  with 
adolescents.  First,  his  two  Rod-and-f rame  Test  scores  intercorrelated  higher 
(0.63)  than  the  two  scores  did  with  Embedded- figures  Test  scores  10.32,  0.40); 
and  secondly,  Vernon  found  that  when  general  intelligence  was  partialed  out  of 
all  correlations,  only  the  intercorrelation  between  the  two  Rod-and-f rame  Test 
scores  remained  significant  (0.53),  while  the  remainder  of  the  correlations, 
including  those  between  the  two  Rod-and-frame  Test  scores  and  the 
Embedded-figures  Test  scores  (0.15,  0.18),  became  nonsignificant.  These 
points  suggest  that  performance  on  the  Rod-and-frame  Test  is  essentially 
independent  of  intelligence,  whereas  performance  on  the  Embedded-figures  Test 
is  highly  related  to  intelligence  level. 


Finally,  Arbuthnot  (1972)  reviewed  40  studies  which  reported 
correlations  among  various  field-dependei.ee  measures.  He  noted  that  the  mean 
correlation  between  performance  on  the  Rod-and-frame  Test  and  the  24-item 
Embedded-figures  Test  from  21  correlations  was  0.54,  and  between  the 
Rod-and-frame  Test  and  the  12-item  Embedded-figures  Test  from  nine 
correlations  was  0.37.  The  magnitude  of  the  correlations  indicated  the  low 
common  variance  shared  by  these  two  clusters  of  measures  and  thus  the  lack  of 
convergent  validity. 


COMPONENTS  UNDERLYING  PERFORMANCE  IN  EMBEDDED-FIGURES  AND  ADJUSTMENT  TESTS 


Several  possible  components  or  factors  affecting  performance  on 
field-dependence  tests  may  be  extracted  to  explain  the  low  covariation  between 
tests  in  the  two  clusters,  embedded-figures  and  adjustment  tests,  and  the  high 
covariation  within  clusters.  There  seem  to  be  components  required  for  the 
solving  of  embedded- figures  tests  but  not  for  adjustment  tests  and  vice  versa. 

One  distinction  between  the  various  embedded-figures  and  adjustment 
tests  such  as  the  Rod-and-f rame  Test,  the  portable  Rod-and-f rame  Test,  the 
Tilting-room-tilting-chair  Test,  Room-adjustment  test,  and  Body-adjustment 
Test  is  the  requirement  in  the  embedded-figures  tests  to  remember  for  at  least 
a  few  seconds  the  given  simple  geometric  form  in  order  to  find  it  when  given 
the  complex  form.  Secondly,  a  perceptual-motor  component  also  underlies 
performance  on  embedded-figures  tests  in  that  the  subject  is  required  not  only 
to  remember  a  simple  form  but  also  to  outline  it  within  the  complex  form. 

A  third  factor  found  only  in  embedded-figures  tests  is 
reversible-perspective  items.  Vojtisek  and  Magaro  (1974)  reported  that 
psychiatric  patients  had  greater  difficulty  solving  items  with  reversible 
perspective  than  with  other  items  on  embedded-figures  tests.  On  the  other 
hand.  Loo  (1978c)  reported  that  normal  females  were  at  least  as  successful  in 
solving  reversible-perspective  items  as  in  solving  the  remaining  items  in  both 
individual  and  group  forms  of  the  Embedded- figures  Tesc.  However,  he  also 
found  that  when  extreme  groups  were  formed  based  on  ease  and  difficulty  of 
solving  reversible-perspective  items,  greater  difficulty  was  associated  with 
greater  sociability  and  more  minor  "psychiatric"  complaints  than  was  greater 
ease  of  solving  such  items.  Finally,  the  time  restraints  placed  on  solution 
tine  to  items  in  embedded- figures  tests  also  distinguish  these  tests  from 
adjustment  tests. 

At  lease  one  factor  or  component  is  required  in  the  adjustment  tests 
but  not  in  the  embedded-figures  tests.  The  adjustment  tests  require  that  a 
kinesthetic  component  be  linked  with  the  visuospatial  component  to  achieve 
high  accuracy  in  these  test  situations. 

Having  stressed  the  components  required  in  one  cluster  of  tests  versus 
the  other,  the  components  required  in  the  solution  of  both  clusters  of  tests 
are  itemized.  As  in  all  tests,  an  optimal  level  of  motivation  and  arousal 
(e.g.,  Oltman,  1964)  on  the  subjects’  part  is  necessary  to  achieve  "best" 
performance.  In  addition,  the  pattern  of  eye  movements  is  related  to 
performance  in  both  clusters  of  tests  if  visual  stimuli  are  used.  For 
example,  Blowers  and  O'Connor  (1978)  found  that  with  the  Rod-and-f rame  Test, 
field-independent  subjects,  unlike  field-dependent  subjects,  showed 
large-magnitude  eye  movements  and  nigh  rates  of  eye  movements. 

Additional,  although  indirect,  support  for  the  eye-movements  component 
is  found  in  Baron's  (1978)  study  which  investigated  the  eye  movements  of  85 
children  during  their  television  watching.  She  found  that  field-independent 
subjects  oriented  to  the  target  words  faster,  had  more  fixations  on  target, 
and  had  longer  fixation  durations  than  did  field-dependent  subjects. 

An  extension  of  the  research  relating  eye  movements  and  field 
dependence  to  include  eye  tracking  may  prove  fruitful.  Recently,  several 


groups  of  researchers  (e.g.,  Holzman  et  al.,  1976;  Kuechenmeister  et  al., 

1977)  reported  impaired  eye  tracking  in  various  groups  of  psychotic  patients. 
Essentially,  such  studies  found  that  in  a  simple  test  of  smooth-pursuit  eye 
movements,  a  high  proportion  of  psychotic  patients  showed  imparied  performance 
which  was  due  to  velocity  arrests.  Further  investigation  (e.g.,  Holzman 
et  al.,  1976)  suggested  that  velocity  arrests  were  due  not  to  voluntary 
processes  but  to  neurophysiological  dysfunctions  probably  located  in  the  brain 
stem.  However,  recent  work  by  Acker  and  Toone  (1978)  demonstrated  that 
impaired  aye  tracking  could  be  induced  in  normals  by  the  addition  of  a 
distracting  task.  They  concluded  that  contrary  to  previous  research  which 
stressed  a  neurophysiological  deficit,  superficial  inattention  or  deficits  in 
selective  attention  might  account  for  the  schizophrenics'  poor  performances. 

In  any  event,  research  in  this  area  is  very  active  and  may  prove  very  fruitful 
for  many  areas  of  psychology. 

In  addition  to  eye  movements  and  eye  tracking,  lateral  eye  movements  as 
studied  in  relation  to  information  processing  and  hemispheric  dominance  (e.g., 
Huang  and  Byrne,  1978)  may  be  of  importance  in  the  study  of  field  dependence. 
Along  the  same  lines,  handedness  and  other  laterality  indicators  are  of 
significant  interest  in  relating  cortical  organization  to  the  field-dependence 
dimension  (e.g.,  O'Connor  and  Shaw,  1978). 

The  comprehensive  examination  through  multivariate  and  univariate 
techniques  of  the  interrelationships  involving  eye  movements,  eye  tracking, 
lateral  eye  movements,  laterality,  and  field-dependence  measures  in  various 
populations  might  yield  important  information  on  the  role  of  cortical  and 
subcortical  structures  and  processes  in  the  field-dependence  dimension. 

FIELD-DEPENDENCE  MEASURES  AND  INTELLIGENCE 

The  controversy  over  the  relationship  between  field  dependence  and 
intelligence  is  long-standing  and  unresolved.  Findings  from  studies  which 
examined  the  relationship  tend  to  indicate  that  the  relationship  between 
scores  o  performance  subtests  (Block  Design,  Object  Assembly,  Picture 
Completion)  from  both  the  Wechsler  Intelligence  Scale  for  Children  and 
Wechsler  Adult  Intelligence  Scale  and  field-dependence  measures  is  carried  by 
the  embedded-figures  tests  and  not  the  adjustment  tests.  The  rotated  factor 
matrix  for  Group  A  reported  by  Goodenough  and  Karp  (1961)  showed  that  the 
Room-adjustment  Test  and  Body-adjustment  Test  loaded  highly  on  factors 
separate  from  subtests  on  the  Wechsler  Intelligence  Scale  for  Children.  In 
contrast,  the  rotated  factor  matrix  for  Group  B  showed  that  all  the 
field-dependence  tests  and  the  three  subtests,  Block  Design,  Object  Assembly, 
and  Picture  Completion,  loaded  on  one  factor. 

Although  the  focus  has  been  on  relating  field-dependence  measures  to 
performance  subtests  from  intelligence  scales,  some  researchers  identified 
relationships  between  embedded- f igures  but  not  adjustment  tests  and  verbal 
subtests.  Two  groups  of  researchers  (Karp  and  Silberman,  1966;  Kiley  and 
Denmark,  1974)  found  that  with  samples  ot  black  subjects,  performance  on  the 
children  and  adult  forms  of  the  Embedded- r igures  Test  was  related  to 
perfo:  luance  on  verbal  subtests  from  Wechsler' s  intelligence  scales.  More 
recently,  O'Leary  et  al.  (1977)  found  that  field  dependence  as  measured  by  the 
group  Embedded- f igures  Test  was  related  to  several  verbal  and  performance 


subtests  on  Che  Wechs ler-bel levue  Intelligence  Scale  for  both  alcoholic  and 
nonalcoholic  groups  of  males.  In  contrast  Co  these  findings,  Vernon  (19721 
noted  that  when  intelligence  was  held  constant  the  correlation  between  the  two 
Rod-and- frame  Test  scores  was  only  slightly  attenuated  from  0.63  to  0.53, 
while  the  correlations  between  these  two  scores  and  the  Embedded-figures  Test 
scores  were  reduced  in  magnitude  by  half  to  nonsignificant  levels  (0.15,  0.181. 

Findings  such  as  these  suggest  that  performance  on  embedded-figures 
tests  is  highly  related  to  performance  on  both  performance  and  verbal 
intelligence  tests.  On  the  other  hand,  performance  on  adjustment  tests  such 
as  the  Rod-and-frame  Test  is  apparently  only  slightly  related  to  performance 
on  intelligence  tests.  The  differing  relationships  between  embedded- figures 
and  adjustment  tests  indicate  that  critics  who  state  that  field  dependence  is 
higr.Ty  related  to  or  even  the  same  as  intelligence  must  qualify  their 
statement  as  applying  to  embedded- figures  tests  only  tsee  Fine,  1973,  Note  5). 

The  finding  that  performance  on  verbal  subtests  is  related  to 
performance  on  embedded- figures  tests  (O'Leary  et  al.,  1977;  Riley  and 
Denmark,  1974),  combined  with  the  finding  that  sex  differences  exist  in 
reported  strategies  for  solving  visuospatial  tests,  suggests  a  further 
component,  verbal  mediation,  underlying  performance  on  embedded- figures  and 
possibly,  too,  adjustment  tests.  It  is  suggested  that  greater  field 
independence  as  measured  by  embedded-figures  tests  may  be  due,  in  part,  to  the 
effective  use  of  verbal  mediation  by  subjects  in  their  problem-solving 
approach  to  test  items.  Loo  and  Townsend  (1977)  found  that  greater  field 
independence,  as  measured  by  the  group  Embedded- figures  Test,  was  associated 
with  lower  impulsivity,  specifically  slower  decision  time.  This  finding 
provided  indirect  support  for  a  possible  verbal  mediation  component.  Verbal 
mediation,  a  time-dependent  and  reflective  behavior,  would  necessitate  low 
impulsivity  and  slow  decision  times. 

FUTURE  DIRECTIONS  FOR  APPLICATIONS  OF  FD 


Methodology 

Given  the  number  and  variety  of  components  underlying  performance  in 
the  two  clusters  of  tests,  embedded-figures  and  adjustment  tests,  it  is 
obvious  that  the  understanding  of  the  high-order  construct  field  dependence 
and  of  its  relationship  to  other  cognitive,  personality,  and  performance 
dimensions  requires  the  execution  of  coprehensive  rather  than  limited  studies 
which  simply  conduct  one-way  analyses  of  variance  or  Pearson  correlations.  It 
is  recommended  that  studies  employ  samples  which  adequately  cover  the  range  of 
possible  scores  on  field-dependence  measures,  that  multiple  measures  and 
scores  of  field  dependence  be  used  as  within-subject  variables,  and  chat 
multivariate  and  multiple  regression  techniques  be  considered. 

Military  Applications 

Military  applications  of  the  field  depencence  construct  in  selection 
and  training  may  show  their  worth  in  the  following  major  areas. 

High  technology  environments.  The  revolutionary  advances  in  computer 
hardware  and  software  are  penetrating  all  areas  of  the  military  and  this  trend 
will  continue  over  the  remainder  of  the  century.  The  potential  uses  of  field 
dependence  in  areas  such  as  computer-based  management  miormacion  systems 


(e.g.,  Zmud,  1980)  and  command,  control  and  communication  (C^)  systems 
(e.g. ,  Loo,  1981)  have  been  identified. 


Vehicle  operation.  The  visual-spatial  and  kinesthetic  components 
associated  with  field  dependence  tests  and  the  findings  of  past  research  with 
vehicle  driving  (e.g..  Loo,  1978a)  and  pilots  (e.g.,  Cullen  et  al.,  1969) 
certainly  supports  the  continued  use  of  field  dependence  in  this  area. 

Visual-spatial  tasks.  Aside  from  the  visual-spatial  requirements  for 
vehicle  operation  as  noted  above,  many  military  occupations  require 
performance  in  complex  visual-spatial  work  environments.  For  example,  field 
dependence  may  be  useful  where  visual  (CRT)  displays  are  used  extensively 
(e.g.,  radar  operators,  air  traffic  controllers,  air  weapons  controllers), 
where  map  reading  and/or  photo  interpretation  are  essential  activities,  and 
where  visual  disembedding  is  critical  such  as  for  forward  observers  in  the 
artillery  and  air  observers  in  search  and  rescue. 

Decision  making  processes.  Field  dependence,  as  well  as  other 
cognitive  styles,  may  be  useful  in  the  area  of  decision  makin0  and  decision 
analysis.  Commanders  at  all  levels  are  decision  makers  who  must  deal  with 
information/communications  activities  under  varying  conditions  of  ambiguity, 
uncertainty,  conflict,  and  ignorance;  thus  field  dependence  may  be  a  useful 
construct  in  examining  decision  making  (see  Loo,  1981;  Witkin  &  Goodenough, 
1977). 

Drug  abuse.  Drug  abuse  in  terms  of  alcohol,  cannibis,  and  other  drugs 
is  a  concern  in  the  military.  The  field  dependence  construct  has  already 
shown  utility  in  the  study  of  addictions  (e.g.,  O'Leary  et  al.,  19777  and 
future  research  should  add  further  to  the  literature. 


REFERENCES 

Acker,  W.  ,  &  Toone,  B.  Attention,  eye  tracking  and  schizophrenia.  British 
Journal  of  Social  and  Clinical  Psychology,  1978,  1_7,  173-181. 

Arbuthnot,  J.  Cautionary  note  on  measurement  of  field  independence. 

Perceptual  and  Motor  Skills,  1972  ,  35.  >  479-483. 

Baron,  L.J.  Relating  eye  movements  to  reading  proficiency,  field  articulation 
and  stimulus  mode.  Paper  presented  at  the  meeting  of  the  Canadian 
Psychological  Association,  Ottawa,  June  1978. 

Barrett,  G.V.  ,  Thornton,  C.L.  &  Cabe,  P.A.  Relation  between  embedded  figures 
test  performance  and  simulator  behavior.  Journal  of  Applied 
Psychology ,  1969,  5J3,  253-234. 

Belletti,  H.E.  &  Anthony,  J.G.  The  embedded  figures  test:  its  relationship  t 
forward  observer  performance.  Paper  presented  at  the  meeting  of  the 
Militarv  Testing  Association,  San  Antonio,  November  1979. 

Blowers,  G.H.  6  O'Connor,  K.P.  Relations  of  eye  movements  to  errors  on  the 
rod-and-f rame  test.  Perceptual  and  Motor  Skills,  1978,  46,  719-725- 


CuJlen,  J.F.,  Harper,  C.R.  &  Kidera,  G.J.  Perceptual  style  differences 

between  airline  pilots  and  engineers.  Aerospace  Medicine,  1969,  4, 
407-403. 

Fine,  B.J.  Field-dependent  introvert  and  neuroticism:  Evsenck  and  Witkin 
united.  Psychological  Reports,  1972,  _31_ ,  939-956. 

Fine,  B.J.  Field-dependence-independence  as  "sensitivity"  of  the  nervous 
system:  supportive  evidence  with  color  and  weight  discrimination. 

Perceptual  and  Motor  Skills,  1973,  32,  287-295. 

Fine,  B.J.  £>  Kobrick,  J.L.  Field  dependence,  practice,  and  low  illumination 
as  related  to  the  Farnsworth-Munsell  100-hue  test.  Perceptual  and 
Motor  Skills,  1980,  51,  1167-1177. 

Goodenough,  D.R.  &  Karp,  S.A.  Field  dependence  and  intellectual  functioning. 
Journal  of  Abnormal  and  Social  Psychology,  1961,  6^,  241-246. 

Holzman,  P.S.,  Levy,  D.L.  &  Proctor,  L.R.  Smooth  pursuit  eye  movements, 

attention,  and  schizophrenia.  Archives  of  General  Psychiatry,  1976, 

33,  1415-1420.  '  _  " 

Howard,  C.W.  Psychological  profiles  and  performance  characteristics  of 

tactical  console  operators.  Paper  presented  at  the  meeting  of  the 
Military  Testing  Association,  San  Antonio,  November  1979. 

Huang,  M  &  Byrne,  B.  Cognitive  style  and  lateral  eye  movements.  British 
Journal  of  Psychology,  1978,  69,  85-90. 

Karp,  S.A.  &  Silberman,  L.  Field  dependence,  body  sophistication,  and 

socioeconomic  status.  Research  Reports,  1966,  2>  1~9,  Sinai  Hospital, 
Baltimore. 

Kennedy,  R.S.  The  relationship  between  habituation  to  vestibular  stimulation 
and  vigilance:  individual  differences  and  subsidiary  problems 
(Doctoral  dissertation,  University  of  Rochester,  1972).  Dissertation 
Abstracts  International,  1972,  32>  2374B-2375B. 

Kuechenmeister ,  C.A.,  Linton,  P.H.,  Mueler,  T.V.  &  White,  H.B.  Eye  tracking  in 
relation  to  age,  sex,  and  illness.  Archives  of  General  Psychiatry, 

1977,  34,  578-579.  “  ' 

Loo,  R.  Individual  differences  and  the  perception  of  traffic  signs.  Human 
Factors ,  1978a,  W,  65-74.  ~ 

Loo,  R.  Relationship  of  extraversion  and  field  dependence  to  neuroticism. 
Psychology ,  1978b,  21>  56-66. 

Loo,  R.  Personality  dimensions  and  reversible  perspective  in  embedded  figures 
test.  Perceptual  and  Motor  Skills,  1978c,  46,  1016-1018. 


747 


Loo,  R.  Individual  differences  as  human  factors  in  tactical  communications 
systems.  Paper  presented  at  meeting  of  the  Human  Factors  Association 
of  Canada,  Toronto,  October  1981. 

Loo,  R.  &  Townsend,  P.J.  Components  underlying  the  relation  between  field 
dependence  and  extraversion.  Perceptual  and  Motor  Skills,  1977,  45 , 
528-530. 

O'Connor,  K.P.  &  Shaw,  J.C.  Field  dependence,  laterality  and  the  EEG. 
Biological  Psychology,  1978,  6,  93-109. 

O'Leary,  M.R. ,  Donovan,  D.M.  &  Chaney,  E.F.  The  relationship  of  perceptual 
field  orientation  to  measures  of  cognitive  functioning  and  current 
adaptive  abilities  in  alcoholics  and  nonalcoholics.  Journal  of  Nervous 
and  Mental  Disease,  1977,  165 ,  275-282. 

Oltman,  P.K.  Field  dependence  and  arousal.  Perceptual  and  Motor  Skills,  1964, 
19,  441.  ^ 

Riley,  R.T.  &  Denmark,  F.L.  Field  dependence  and  measures  of  intelligence: 

some  reconsiderations.  Social  Behavior  and  Personality,  1974,  2,  25-29. 

Stasz,  C.  &  Thorndyke,  P.W.  The  influence  of  visual-spatial  ability  and  study 
procedures  on  map  learning  skill.  Rand  Corp  /  N-1501-ONR,  1980. 

Vernon,  P.E.  The  distinctiveness  of  field  independence.  Journal  of 
Personal ity ,  1972,  40,  366-391. 

Vojtisek,  J.E.  &  Magaro,  P.A.  The  two  factors  present  in  the  embedded  figures 
test  and  a  suggested  short  form  for  hospitalized  psychiatric  patients. 
Journal  of  Consulting  and  Clinical  Psychology,  1974,  4_2,  554-558. 

Wachtel,  P.L.  Field  dependence  and  psychological  differentiation: 

re-examinantion.  Perceptual  and  Motor  Skills,  1972,  15,  179-189. 

Witkin,  H.A.  &  Goodenough,  D.R.  Field  dependence  and  interpersonal  behavior. 
Psychological  Bulletin,  1977,  84,  661-689. 

Witkin,  H.A. ,  Goodenough,  D.R.  &  Oltman,  P.K.  Psychological  differentiation: 
current  status.  Journal  of  Personality  and  Social  Psychology,  1979, 

37,  1127-1145. 

Witkin,  H.A. ,  Lewis,  H.B.,  Hetzman,  M. ,  Machover,  K. ,  Meissner,  P.B.  & 

Wagner,  S.  Personality  through  perception.  Harper:  New  York,  1954. 

Zmud,  R.W.  An  information  processing  conceptualization  of  the  systematic- 
heuristic  cognitive  style.  Manuscript  from  author  at  Georgia  State 
University,  1980. 


748 


TABLE  1 


CORRELATES  OF  FIELD  DEPENDENCE-INDEPENDENCE 


AREA 

Autonomy  in  social 
situations 

Motivation  orientation 

Sociability 

Impulsivity 

Self-reliance 

Vocational  interests 
A  Career  choice 

Sex  differences 


GREATER  FD 


Low 

Extrinsic 

High 

High 

Low 

Humanitarian-Helping 

domains 

Females  tend  to  be  more 
FD 


I  GREATER  FI 

I 

1 - 

I 

I  High 

I 

I  Intrinsic 

I 

I  Low 

I 

I  Low 

I 

I  High 

I 

I  Theoretic  (analytic) 
i  -  Artistic  domains 

I 

I  Males  tend  to  be  more 
I  FI 


Anxiety 

Self-reported  personal 
attributes 


.  Neurophysiological 
Differentiation 

.  Psychopathology 


High 

In  general,  FI  people  tend 
to  describe  themselves  and 
to  be  described  by  others 
in  such  terms  as  friendly, 
considerate,  warm,  affec¬ 
tionate,  polite,  tactful, 
accommodating,  nonevalua- 
tive  and  accepting  of 
others,  like  people  and  are 
liked  by  others,  and  make 
others  feel  comfortable 
with  them 


Low 

Identity  problems,  over 
dependence,  inadequate 
emotional  controls 


Low 

In  contrast,  the 
descriptions  of  FI 
people  include,  among 
other  characteristics, 
inconsiderate,  rude, 
demanding,  ambitious, 
interested  in  power, 
opportunistic,  and  man¬ 
ipulate  people  as  a 
means  of  achieving 
personal  ends 


High 

Delusions,  outward 
aggressiveness, 
overideat ion. • • 


TABLE  2 


APPLICATIONS  OF  FD 

Sensation  &  Perception 

Human  Operator  Performance 
•  Tactical  console  operators 

Forward  Observation  Performance 

Pilot  Performance 

Vehicle  Driver  Performance 

Performance  in  Simulators 

Map  Reading 

Commanders  in  Systems 
Psychopathology 
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T  &  E  Rating  With  Objectively  Scored 
Applicant  Appraisal  Questionnaires 

Thomas  J.  Lyons  Ph.D 
Personnel  Psychologist 


'"‘A  procedure  based  on  a  content  validity  strategy  has  been  developed  for 
constructing  Objectively  Scored  Applicant  Appraisal  Questionnaires  (OSAAQ) 
to  replace  traditional  ratings  of  training  and  experience  (TSE  Ratings). 

There  are  both  technical  and  administrative  problems  with  traditional  TSE 
Ratings  that  depend  on  examiner  judgement  to  evaluate  narrative  responses 
to  open  ended  questions  against  general  rating  guides  or  benchmarks.  An 
OSAAQ  exam  consists  of  constructed  response  questions  (e.g. ,  multiple  choices, 
check  list)  with  an  objective  scoring  key  for  evaluating  applicant  responses. 
This  makes  OSAAQ  items  as  appropriate  for  automated  processing  and  scoring 
as  objective  written  test  items  and  minimizes  administrative  costs  with  large 
numbers  of  applicants.  Objective  item  formats  and  scoring  procedure  of 
OSAAQ  exams  also  ensure  consistent  evaluation  of  applicant  qualifications. 
General  procedures  for  constructing  OSAAQ  exams  are  described  with  a 
discussion  of  technical  and  administrative  issues  related  to  OSAAQ  exam 
development  projects  for  several  occupations. 
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The  model  described  in  this  paper  is  designed  to  improve  the  efficiency,  job 
relatedness  and  reliability  of  unassembled  exams  used  to  evaluate  applicants 
for  high  volume  Federal  occupations.  Unassembled  exams  refer  to  a  variety 
of  procedures  for  rating  training  and  experience  (T  8  E  ratings)  which 
require  raters  (usually  personnel  staffing  or  occupational  specialists) 
to  evaluate  narrative  information  summarized  on  application  and/or 
supplemental  forms.  These  procedures  are  typically  used  to  examine  for  full 
performance  and  level  jobs  and  entry  level  professional  and  technical  jobs 
requiring  specialized  training  or  experience.  Although  T  8  E  ratings  are 
usually  practical  to  apply  with  small  numbers  of  applicants,  they  become 
costly  to  administer  when  the  volume  of  applications  is  high.  It  is  also 
difficult  to  ensure  adequate  variability  and  consistency  in  applicant  ratings 
for  traditional  T  8  E  ratings  that  are  applied  to  high  volume  occupations. 

The^e  concerns  have  led  to  the  development  of  a  new  unassembled  examining 
methodology  for  constructing  Objectively  Scored  Applicant  Appraisal  Question¬ 
naires  (OSAAQ)  that  is  practical  to  use  and  capable  of  objectively  differen¬ 
tiating  among  large  numbers  of  applicants  on  job  related  factors. 

Background 

The  OSAAQ  exam  development  methodology  is  used  to  develop  job  related  question¬ 
naires  consisting  of  items  with  constructed  response  formats  (e.g.  multiple 
choice,  check  list,  rating  scales)  which  are  used  as  selection  instruments 
for  high  volume  jobs  where  an  evaluation  of  experiences,  education  and  other 
job  related  accomplishments  is  the  appropriate  examining  strategy.  This  type 
of  item  format  makes  it  possible  to  develop  an  optical  scan  form  for 
automated  processing  and  scoring  of  applicant  responses.  The  cost  and 
time  estimates  for  processing  optical  scan  forms  are  very  similar  to  those 
required  to  process  wiritten  test  answer  sheets.  The  use  of  OSAAQ's  makes 
it  possible  to  process  large  numbers  of  applications  in  an  economical  manne>’ 
and  provides  a  standard  measurement  instrument  that  assures  a  high  level  of 
objectivity  in  the  collection  ano  evaluation  of  job  related  applicant  information. 


Most  applicant  questionnaires  with  objective  scoring  keys  used  in  personnel 
selection  have  been  based  on  a  criterion  related  validity  model  (Van  Rijin, 
1980).  Scoring  keys  are  empirically  derived  for  items  which  differentiate 
the  job  performance  of  incumbents.  The  research  literature  reports  substantial 
support  for  the  validity  of  questionnaire  items  from  biographical  information 
blanks  and  weighted  application  blanks  based  on  empirically  derived  scoring 
keys  (Owens,  1976).  However,  few  of  the  studies  cited  in  the  literature 
provide  any  evidence  for  the  job  relatedness  (i.e. ,  item  content  samples 
important  aspects  of  the  job)  of  items  used.  It  is  not  uncommon  for  background 
variables  such  as  occupation/salary  of  father,  place  of  residence,  or  age  to 
be  the  type  of  item:  that  are  predictive  of  job  success  in  many  occupations. 

The  likelihood  of  legal  cha7lpnoe,  questions  of  test  fairness  and  problems 
with  public  acceptance  makes  the  use  of  non  job  related  (from  a  content  validity 
frame  of  reference)  questionnaire  items  unacceptable  in  public  sector  selection 
programs  regardless  of  their  empirical  validity.  (Pace  and  Schoenfeldt,  1977). 

Traditional  T  8  E  ratings  are  usually  based  on  a  content  validity  strategy 
and  contain  questions  and  scoring  procedures  which  are  closely  related  to 
the  job  studied.  Typically,  applicants  provide  narrative  responses  to  questions 
on  an  application  or  supplemental  form  describing  the’r  experience,  education 


and  achievements  rel event  to  the  job.  These  narrative  responses  are  reviewed 
by  raters  (preferrably  trained  subject  matter  experts)  who  assign  points  based 
on  scoring  protocals  usually  anchored  with  benchmarks  describing  several  levels 
of  experience,  education  and  achievements.  Requiring  applicants  to  provide 
narrative  responses  to  open  ended  questions  and  then  utilizing  rater  judgements 
to  evaluate  these  responses  limits  the  objectivity  of  traditional  T  S  E  rating 
procedures  (Levine  and  Flory,  197b). 

The  OSAAQ  exam  development  methodology  combines  features  from  biographical 
information  blanks  and  traditiona  T  i  E  ratings.  Item  formats  consist  of 
constructed  response  questions  that  result  in  a  standard  measurement 
instrument  with  an  objective  scoring  key.  The  content  of  times  is  clearly 
job  related  and  based  on  a  content  validation  strategy.  The  development 
of  OSAAQ  items  based  on  content  validity  strategy  requires  that  specific 
procedures  be  followed  to  provide  evidence  for  the  job  relatedness,  clarity 
and  appropriateness  of  items  used  to  rank  applicants  for  a  particular  occupation. 
Although  these  procedures  usually  require  more  effort  than  the  development  of 
traditional  T  A  E  ratings,  they  do  not  require  the  time  and  resources  of  an 
empirical  validity  study.  When  the  volume  of  applications  is  high  the  "front 
end"  work  required  to  develop  OSSAQ's  is  more  than  offset  by  savings  in  application 
processing  and  scoring. 

Application 

The  OSSAAQ  methodology  is  based  on  a  content  validity  approach  to  exam 
development.  Therefore,  the  application  of  this  exam  development 
methodology  is  most  appropriate  for  situations  where  specific  experience, 
education  and  accomplishments  are  typical  of  qualified  applicants  and  are 
clearly  related  to  important  aspects  of  the  job.  For  purely  trainee 
level  jobs  where  specific  experience  and  education  is  not  typical  of 
applicants  nor  related  to  successful  job  performance,  the  OSAAQ  methodology 
may  not  provide  sufficient  evidence  to  satisfy  content  validity  requirements. 

Studies  are  planned  to  investigate  the  feasibility  of  using  OSAAQ's  for 
entry  level  trainee  jobs  based  on  content  validity.  It  may  be  possible 
to  use  these  procedures  on  an  interim  basis  pending  the  development  of 
additional  validity  evidence. 

Unassembled  exams  based  on  OSAAQ  methodology  are  currently  being  developed 
for  occupations  which  typically  have  large  numbers  of  applicants  for 
job  vacancies.  These  occupations  are  also  characterized  by  applicant 
populations  that  have  specific  experience,  education  or  accomplishments 
relevant  to  important  aspects  of  the  work  performed.  One  group  of  these 
occupations  is  high  volume  journeymen  (full  performance)  level  jobs  with 
specific  experience  requirements  which  have  a  high  degree  of  similarity 
in  the  work  performed  by  incumbents.  An  example  of  this  type  of  occupation 
for  which  an  OSAAQ  has  been  developed  is  Clerk  Typist  (GS-4)  positions 
(Lyons,  1980).  These  are  full  performance  level  clerical  positions 
which  require  applicants  to  have  clerical  experience.  An  occupational 
supplement  with  OSAAQ  items  was  developed  to  measure  applicants  clerical 
work  experience  relevant  to  important  aspects  of  the  job.  Try  out  of  the 
OSAAQ  and  pilot  testing  of  the  exam  procedures  were  completed  before  implementing 
the  exam  on  a  n  ionwide  basis  for  selecting  applicants  for  Federal  clerical 
jobs. 

A  second  group  of  occupations  appropriate  for  the  OSAAQ  methodology  is 
high  volume  entry  level  professional  jobs  which  have  specific  education  or 
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experience  requirements.  These  jobs  usually  require  applicants  to  have  some 
minimum  amount  of  academic  preparation  in  the  professional  field  in  addition 
to  related  experience  or  college  education.  These  basic  requirements  assure 
that  applicants  have  common  job  related  experiences  or  education  relevant  to 
important  aspects  of  the  work  in  the  professional  field.  This  makes  if  feasible 
to  develop  OSAAQ  exams  measuring  job  related  experiences,  education  and  accomplish¬ 
ments  relevant  to  the  applicant  population  and  important  to  job  success  in  the 
profession. 

Exam  development  projects  either  completed  or  underway  in  this  job  category 
are  Nurse,  Forester,  Biologist  and  Accountant/Auditor.  For  each  of  these 
occupations  the  format  of  OSAAQ  items  and  the  scoring  procedures  have  been 
tailored  to  the  grade  levels  covered  (e.g.  GS-4/9  for  Nurse  and  GS5/7  for 
Forester),  subspecialties  identified  and  particular  characteristics  of  the 
profession.  The  basic  OSAAQ  methodology  is  appropriate  to  all  jobs  in  this 
category  but  the  •'esulting  item  formats  and  scoring  procedures  are  likely  to 
vary  from  one  occupation  to  another  depending  on  the  type  of  experience,  education 
or  accomplishments  sampled  and  the  way  in  which  they  contribute  to  job  success. 

A  third  group  of  occupations  for  which  the  OSAAQ  methodology  is  appropriate  are 
jobs  in  which  applicants  are  likely  to  have  experience,  education,  and  accomplish¬ 
ments  directly  related  to  important  aspects  of  the  work  but  not  required  of 
applicants  to  be  eligible  for  employment.  An  example  of  this  type  of  job  is 
entry  level  Computer  Specialist.  At  the  entry  level  there  are  no  specific 
experience  or  education  requirements  for  this  job.  However,  many  applicants 
have  some  computer  related  training  or  experience.  From  a  technical  point  of 
view  it  is  appropriate  to  use  a  content  validity  strategy  for  crediting  job 
related  past  experience,  education,  and  accomplishments  in  evaluating  appli¬ 
cants  for  employment.  As  a  practical  matter  It  makes  sense  to  consider  hig-.ly 
job  related  experience  and  education  common  to  many  applicants  in  the  examining 
plan  for  the  occupation.  For  the  Computer  Specialist  occupation  OSAAQ  items 
have  been  developed  to  obtain  information  about  an  applicant's  education  and 
experience  for  entry  level  positions.  The  OSAAQ  can  be  used  in  conjuction 
with  a  written  test  of  computer  aptitude  to  rank  applicants  for  entry  level 
computer  specialist  positions.  The  combination  of  an  OSAAQ  with  a  written 
test  results  in  a  multiple  assessment  strategy  which  has  the  potential  for 
grea-er  validity  than  a  single  selection  instruments. 

Like  other  procedures  based  on  a  content  validity  strategy  the  application  for 
the  OSAAQ  methodology  to  entry  level  trainee  jobs  is  uncertain.  It  is  difficult 
to  identify  past  experience,  education  or  accomplishments  that  can  be  considered 
a  relevant  sample  of  most  trainee  type  jobs.  Applicants  for  these  jobs  frequently 
do  not  have  a  common  set  of  job  related  background  experience.  This  limits 
the  possibility  of  developing  OSAAQ  Items  which  are  appropriate  to  the  applicant- 
population  and  that  sample  relevant  aspects  of  work  in  the  occupation.  At 
present  OSAAQ  development  is  limited  to  high  volume  occupations  with  applicant 
populations  which  have  common  background  experiences  that  are  directly  related 
to  the  occupation  studied. 

Basic  Requirements  of  the  Mcdel 

The  specific  procedural  steps  for  developing  OSAAQs  are  likely  to  vary  depending 
on  the  nature  of  the  occupation  studied,  the  characteristics  of  the  applicant 
population  and  the  availability  of  SME's  and  applicants  during  item  development, 
try  out  and  pilot  testing.  Although  no  one  set  of  "cookbook"  procedures  is 
appropriate  for  OSAAQ  development,  the  study  design  for  each  OSAAQ  development 
project  should  have  the  following  characteristics. 
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Behaviorally  Based  Job  Analysis.  Since  the  OSAAQ  methodology  is  based  on  a 
content  validity  approach  it  requires  a  job  analysis  procedure  that  sunmarizes 
the  worked  performed  in  behavioral  terms.  A  work  behavior/task  inventory  is 
usually  the  most  appropriate  way  of  summarizing  the  important  elements 
of  a  job.  However,  any  procedure  for  describing  the  job  in  terms  of 
observable  behaviors  important  to  job  success  is  appropriate. 

Job  Related  Items.  The  items  developed  for  each  OSAAQ  must  clearly 
relate  to  important  aspects  of  the  job.  items  usually  sample  education, 
work  experience,  and  other  accomplishments  which  are  directly  related  to 
important  work  behaviors  required  for  success  on  the  job.  The  closer  the 
correspondence  between  the  content  of  OSAAQ  items  and  the  work  performed 
on  the  job  the  greater  the  support  for  content  validity. 

Objective  Item  Format.  One  of  the  main  features  in  the  OSAAQ  methodology 
is  the  requirement  that  all  items  be  of  the  constructed  response  type  (e.g. 
multiple  choice,  check  list,  rating  scale).  An  objective  item  format  assures 
a  standard  measurement  instrument  which  requires  all  applicants  to  respond 
to  the  same  set  of  questions  (see  attachment  1  for  an  example).  Open  ended 
questions  found  on  typical  application  or  supplemental  forms  are  not 
appropriate  for  the  OSAAQ  methodology  because  of  the  lack  of  standardiza¬ 
tion  in  the  exam  instrument  and  the  inherent  subjectivity  present  in  scoring 
narrative  responses. 

Item  Try  Out.  Items  are  assembled  into  an  OSAAQ  format  and  administered 
to  job  applicants  usually  as  an  experimental  form  along  with  the  existing 
exam.  Try  Out  of  OSAAQ  items  provides  information  on: 

item  relevance  -  the  response  rate  to  each  item  category  is  tabulated 

to  determine  the  relevance  of  an  item  for  the  applicant 
population. 

item  clarity  -  the  type  and  number  of  errors  made  in  responding  to 
items  are  tabulated  to  check  the  adequacy  of  instruc¬ 
tions  and  item  format. 

item  variability  -  .  frequency  count  is  made  of  responses  to  each 
item  category  to  determine  those  items  with 
adequate  variability  to  differentiate  among 
applicants. 

°revious  experience  in  developing  OSAAQ  exams  has  demonstrated  the  importance 
of  this  process  in  selecting  appropriate  items  and  developing  scoring  proce¬ 
dures. 

SHE  Item  Ratings.  Subject  Matter  Experts  (SME)  are  used  to  verify  the  job 
relatedness  of  OSAAQ  items  and  to  rate  their  value  in  differentiating 
between  better  and  lesser  qualified  applicants.  Scales  for  evaluating  items 
are  developed  and  independent  ratings  are  obtained  from  SME's.  These 
ratings  along  with  other  evidence  from  the  job  analysis  and  background 
information  are  used  to  establish  item  values  for  ranking  purposes. 

Objective  Scoring  Keys.  Item  selection  and  scoring  of  response  alternatives 
are  based  on  independent  SME  ratings  and  normative  data  collected  during  the 
try  out  with  applicants.  An  objective  scoring  key  is  prepared  for  items 
that  are  consistently  judged  by  SME's  as  differentiating  among  applicants  on 
important  job  related  factors  and  that  have  adequate  variability  based  on  try 
out  results  with  applicants. 
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Pilot  Studies.  When  time  and  resources  permit  it  is  desirable  to  conduct  a 
pilot  study  of  the  OSAAQ  exam  before  full  scale  implementation.  Results  of  a 
pilot  study  are  useful  in  refining  the  OSAAQ  items  and  scoring  key.  The  likely 
benefit  from  a  pilot  study  is  closely  related  to  the  results  obtained  during 
item  try  out.  When  significant  changes  in  instructions  or  item  format  are 
suggested  by  the  try  out  results  further  testing  is  usually  required  before 
implementing  an  OSAAQ  exam. 

Placement  Follow-up  Studies.  The  content  validity  approach  used  to  develop 
OSAAQ  exams  is  sufficient  for  documenting  job  relatedness  requirements.  However, 
past  research  on  unassembled  exam  {Johnson,  et.  al.  1980;  Schmidt  et.  al.  1979) 
provides  scant  evidence  for  the  empirical  validity  of  these  procedures.  Given 
the  lack  of  empirical  support  for  T  8  E  ratings  it  is  highly  desirable  to 
conduct  placement  follow-up  studies  to  determine  which  item  types  and  scoring 
procedures  should  be  used  to  maximize  predictive  validity.  Results  from  place¬ 
ment  follow-up  studies  will  provide  evidence  on  the  predictive  validity  of 
current  exams  and  indicate  changes  required  to  improve  the  validity  of  OSAAQ 
exams.  Pilot  testing  and  placement  follow-up  studies  are  the  only  parts  of  the 
OSAAQ  exam  development  process  that  are  not  required  for  establishing  job  related¬ 
ness  based  on  content  validity.  However,  these  two  parts  of  OSAAQ  development 
should  be  carried  out  whenever  feasible  to  improve  the  quality  of  the  exam  and 
establish  its  validity  in  predicting  job  success. 

General  Procedures 

Although  one  set  of  procedures  cannot  be  prescribed  to  fit  all  OSAAQ  development 
projects,  results  frcm  studies  completed  or  in  progress  suggest  a  general 
approach  for  constructing  OSSAQ  exams  which  is  practical  to  implement  (reasonable 
time/resource  requirements)  and  technically  sound  (job  relatedness  based  on 
content  validity).  The  procedural  steps  described  in  this  section  are  intended 
as  a  guide  to  OSAAQ  development.  Exam  development  plans  prepared  for  individual 
occupations  will  likely  require  some  modification  in  the  general  procedures 
to  assure  vl.at  the  basic  requirements  of  the  OSAAQ  model  have  been  met. 

1.  Review  of  Existing  Occupational  Information.  For  many  Federal  occupation 
there  are  a  variety  of  informational  sources  about  the  work  performed  and  about 
applicant  characteristics  important  for  successful  job  ■\erfortnance.  Most  of 
the  information  is  likely  to  be  found  in  unpublished  reports,  studies,  manuals, 
or  other  institutional  documents.  Some  examples  are;  0PM  classification/quali¬ 
fication  studies,  agency  position  descriptions,  special  purpose  job  analysis 
studies,  career  surveys,  certification  requirements,  training  programs  and 
completed  application  forms.  A  literature  review  of  related  journal  articles 
books,  and  manuscripts  is  also  valuable  for  identifing  relevant  published  material. 
Besides  collecting  background  information  for  the  study,  the  review  process  pro¬ 
vides  the  exam  development  specialist  with  occupational  knowledge  which  is 
particularly  important  in  planning  and  conducting  a  content  validity  study. 


2.  Prepare  Work  Behavior  and  OSAAQ  Items.  Background  information  collected 
about  the  occupation  is  used  to  develop  a  tentative  draft  of  work  behavior  and 
OSAAQ  items.  Work  behavior  items  consist  of  simple  task  statements  describing 
one  aspect  of  the  work  performed.  The  list  of  work  behavior  items  should 
cover  all  important  aspects  of  the  work  identified  from  the  background  infonnation. 
Position  descriptions,  classification  standards  and  special  purpose  job  analysis 
studies  are  the  typical  sources  of  information  for  developing  work  behavior 
lists.  Common  sources  of  background  information  for  developing  OSAAQ  items 
are  completed  application  forms,  training/education  programs  and  certification 
requirements.  These  items  consist  of  objective  question  with  constructed 


response  alternatives  covering  specific  work  experience,  education  and  other 
accomplishments  directly  related  to  work  performed  on  the  job.  The  content  of 
05AAQ  items  should  be  as  observable  and  verifiable  as  possible.  The  items 
should  sample  experiences  that  are  likely  to  be  common  to  the  majority  of 
applicants.  All  work  behavior  and  OSAAQ  items  that  appear  reasonable  based  on 
the  background  information  reviewed  should  be  listed.  At  this  stage  of  the 
process  the  items  should  be  exhaustive  and  in  an  objective  format. 

3.  Panel  Review  by  Subject  Matter  Experts  (SHE).  A  panel  of  SME's  representing 
agencies/organizations  with  the  largest  number  of  selections  is  assembled  to 
revise  work  behavior  and  OSAAQ  items.  The  panel  should  consist  of  first  line 
supervisors  and/or  work  leaders  of  incumoents  in  the  occupation  studied.  SME's 
should  have  a  thorough  knowledge  of  the  job  being  studied  and  have  direct 
contact  with  workers  in  the  occupation.  Panel  members  review  the  work  behavior 
items  for  accuracy  and  completness.  They  also  review  OSAAQ  items  and  recommend 
changes  (revisions/additions)  in  the  item  pool.  Additional  items  or  revisions 
of  existing  items  must  be  amenable  to  an  objective  format,  deal  with  observable/ 
verifiable  job  related  experiences  and  be  reasonable  to  expect  from  applicants. 

4.  Administer  Work  Behavior  Inventory.  Revised  work  behavior  items  are  assembled 
into  an  inventory  and  administered  to  a  representative  sample  of  incumbents. 

Each  work  behavior  item  is  rated  by  the  incumbent  on  time  spent  and  importance 
scales,  as  well  as,  indicating  whether  or  not  the  item  is  performed  independently, 
with  guidance,  or  as  a  part  of  training. 

5.  SME  Ratings  of  OSAAQ  Items.  An  occupational  supplemental  consisting 

of  revised  OSAAQ  items  is  prepared  for  review  by  a  representative  sample  of 
SME's.  A  sample  size  of  approximately  30  first  line  supervisors  and/or  work 
leaders  representing  major  users  is  usually  adequate  depending  on  the  number 
of  agencies  involved  and  the  similarity  of  positions  covered  by  the  occupation. 
Independent  ratings  on  the  value  of  each  item  for  differentiating  between  better 
and  lesser  qualified  applicants  is  obtained  from  the  SME's. 

6.  Try  Out  of  OSAAQ  Items.  The  OSAAQ  items  are  assembled  into  a  supplemental  form 
and  administered  to  applicants  on  an  experimental  basis  along  with  current 
examining  procedures.  A  representative  sample  of  100  completed  forms  from  the 

try  out  group  is  usually  adequate  for  each  occupation,  if  placement  follow  up 
studies  ar°  not  to  be  carried  out  with  the  try  out  group.  When  placement 
follow  up  studies  are  planned  for  the  try  out  group,  all  applicants  for  a 
given  period  of  time  will  be  given  the  form  to  complete.  The  duration  of  time 
required  is  function  of  the  selection  ratio  and  the  rate  of  placements. 

7.  Finalize  OSAAQ  and  Develop  Scoring  Key.  In  selecting  items  for  the 
final  version  of  the  OSAAQ  both  try  out  results  and  SME  ratings  are 
considered.  Data  from  the  try  out  should  provide  evidence  of  item  rele¬ 
vance,  clarity  and  variability.  The  SME  ratings  should  confirm  the  job 
relatedness  of  items  and  provide  estimates  of  the  value  of  each  response 
category  for  differentiating  among  better  and  lesser  qualified  applicants. 

Once  the  items  have  been  selected  and  final  revisions  made  a  scoring  key 
is  developed  based  on  SME  ratings  of  item  values.  The  distribution  of 
scores  from  the  try  out  group  is  used  to  develop  a  transmutation  table  for 
converting  raw  scores  to  final  ratings. 

8.  Conduct  Pilot  Study  (Optional).  It  may  be  desirable  to  apply  the  procedures 
on  a  snail  scale  (e.g.  two  or  three  area  offices)  for  a  limited  period  of 

time  to  test  out  the  operational  characteristics  of  the  exam  before  nation¬ 
wide  implementation.  This  step  can  be  valuable  for  checking  revisions  based 
on  the  try  out  results  and  for  determining  the  effects  of  the  scoring  key  and 
transmutation  tables  on  the  variability  of  applicant's  ratings.  Unforseen 
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administrative  and  operational  problems  can  also  be  detected  and  appropriate 
changes  made  to  correct  them.  Pilot  testing  is  not  an  essential  procedural 
step  particularly  when  try  out  data  and  $ME  ratings  result  in  little  or  no 
revision  in  OSAAQ  items  and  when  the  scoring  key  produces  reasonable  vari¬ 
ability  in  applicant  ratings. 

9.  Placement  Fol low-up  Studies.  Documentation  of  exam  development  procedures 
based  on  the  OSAA0  model  provides  adequate  evidence  for  job  relatedness  based 
on  content  validity.  However,  further  information  needs  to  be  collected  to 
determine  if  OSAAQ  items  with  a  scoring  key  developed  by  these  procedures 
is  predictive  of  job  success.  Depending  on  the  size  of  the  try  out  sample 
and  selection  ratio  it  may  be  possible  to  collect  placement  follow-up  data 
from  the  try  out  group.  Since  OSAAQ  items  are  administered  on  an  experi¬ 
mental  basis  to  the  try  out  sample  there  is  less  likelyhood  of  a  restriction 
in  range  than  with  placement  follow-up  studies  using  operational  examining 
instruments.  When  it  is  not  practical  (sample  size  and  selection  ratio 
are  small)  to  conduct  follow-up  studies  on  the  try  out  group,  the  studies 
can  be  conducted  on  applicants  selected  for  jobs  after  the  OSAAQ  exam  is 
implemented.  Supervisory  rating  scales  developed  for  each  occupation 
studied  are  a  practical  method  for  collecting  job  performance  data  on 
applicant  and  ensure  a  standard  criterion  measure  across  organizational 
units  studied. 

The  procedures  described  in  this  paper  provide  the  basis  for  developing 
OSASQ  exams.  The  procedures  are  practical  to  implement  and  have  adapted 
well  to  current  exam  development  efforts.  However,  the  procedures  are 
still  developing  and  it  is  expected  that  improvements  will  be  made  in 
applying  the  OSAAQ  model  to  exam  development  projects  for  other  occupations. 

Summary 

The  OSAAQ  model  provides  a  methodology  for  developing  unassembled  exams 
with  objective  item  formats  and  scoring  procedures.  This  makes  it  possi¬ 
ble  to  utilize  automated  application  processing  and  scoring  capabilities 
to  minimize  the  time  spent  by  staffing  personnel  in  routine  application 
rating  and  register  maintenance  activities.  Objective  questionnaire 
formats  and  scoring  keys  also  assures  that  applicant  responses  are  con¬ 
sistently  evaluated.  Thus  eliminating  unreliability  associated  with  the 
narrative  response  styles  of  applicants  and  with  the  judgements  of  examiners 
that  is  typical  with  traditional  unassembled  exams. 

There  are  two  limitations  to  the  application  of  OSAAQ  methodology  for  exam 
development.  The  first  is  a  practical  concern.  Due  to  the  "front  end"  work 
required  to  develop,  try  out,  program  and  implement  automated  OSAAQ  exams  the 
occupation  selected  should  not  be  expected  to  have  a  small  volume  of  applications 
and  selections.  The  second  concern  is  a  technical  issue.  Because  the  exam 
is  based  on  a  content  validity  strategy  OSAAQ  items  must  be  based  on  job  related 
accomplishments  that  are  common  to  the  life  experience  of  the  typical 
applicant.  This  means  that  for  purely  trainer  jobs  where  there  are  no 
objective  job  related  life  experiences  typical  of  applicant  the  OSAAQ 
methodology  will  usually  be  insufficient  for  documenting  validity. 

Empirical  validity  studies  are  usually  required  to  demonstrate  the  validity 
of  exam  procedures  for  trainee  jobs. 

The  OSAAQ  methodology  provides  a  practical  way  of  developing  job  related 
objective  measurement  instruments  for  situations  where  unassembled  exams  are 
appropriate.  However,  placement  follow  up  studies  are  needed  to  evaluate 
the  validity  of  item  types  and  scoring  procedures  in  predicting  successful 
job  performance. 
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CLERICAL  EXPERIENCE  IW  VARIOUS  JOBS 


Attachment  1 
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AN  ATTEMPT  TO  VALIDATE  A  SHORT  JOB  EVALUATION  QUESTIONNAIRE 


Walter  Mann* 

Office  of  Personnel  Research  and  Development 
U.  S.  Office  of  Personnel  Management 
Washington,  D.  C.  20415 


SUWAEY 

* 

A  job  analysis  undertaken  for  the  Personnel  Office  of  the  U.  S.  Virgin 
Islands  (V.I.)  used  questionnaire  data  provided  by  clerical  and  administrative 
personnel  to  accomplish  several  objectives.  One  objective  was  to  develop 
a  quantitative  procedure  for  assigning  grades  to  classes  of  positions  by 
means  of  a  15-item  job  evaluation  questionnaire.  A  factor  analysis  of  the 
15  job  evaluation  scales  resulted  in  three  interpretable  factors:  Autonomy, 
complexity,  and  Environmental  Demands.  Factor  scores  for  the  first  two  of 
these  factors  were  effective  predictors  of  grade  (R  -  .65).  This  correlation 
is  fairly  high  considering  the  degree  of  error  variance  in  the  grades  of 
individual  positions.  The  use  of  more  reliable  class  data,  as  would  be 
the  case  in  practice,  should  produce  a  higher  correlation.  To  test  this 
hypothesis,  predictor  scores  for  positions  from  the  preceding  analysis  were 
grouped  by  grade;  median  predictor  scores  correlated  .92  with  actual  grade. 
Despite  the  promising  results,  additional  research  is  needed  to  validate 
the  questionnaire.  .> 


*Tie  statement  expressed  in  this  paper  are  those  of  the  author  and  do  not 
necessarily  reflect  official  pxolicies  or  opinions.  Dr.  Patrick  McAuley 
participated  in  the  development  of  the  Job  Analysis  Questionnaire. 
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BACKGROUND 


In  the  raid  1970's  the  U.S.  Virgin  Islands  (V.I.)  requested  assistance 
from  the  U.S.  Civil  Service  Corrmission  (CSC)  to  upgrade  their  Personnel 
Office.  Ihe  author  was  sent  there  on  a  one-year  mobility  assignment  and 
initiated  a  variety  of  projects.  One  of  these  projects  resulted  in  a  multi¬ 
purpose  job  analysis  of  clerical  and  administrative  positions.  Ihe  intention 
was  to  provide  V.I.  management  with  a  data  base  suitable  for  making  selection, 
classification,  and  job  evaluation  decisions.  A  recently  published  report 
of  that  project  (Mann,  1980)  is  oriented  towards  selection.  Ihe  present 
paper  will  focus  on  job  evaluation.  Ihe  present  paper  will  also  describe 
the  accomodations  that  had  to  be  made  because  the  job  evaluation  study 
was  done  within  the  context  of  a  selection-oriented  job  analysis. 

Ihe  V.I.  study  used  the  job  evaluation  factors  and  their  definitions  that 
had  been  developed  for  Federal,  white-collar  non-supervisory  jobs  from  GS-1 
through  GS-15  (Anderson  &  Corts,  1973).  These  job  evaluation  factors  and 
their  definitions  were  based  on  a  review  of  the  literature  and  actual  practice. 
They  were  reviewed  by  employee  unions  and  professional  organizations.  A 
breakout  of  job  factors  into  subfactors  and  definitions  of  degree  levels 
for  both  was  carried  out  by  CSC.  The  five  factors,  with  subfactors  in 
parentheses,  were:  Knowledge  Required  by  the  Job,  Responsibility  (Supervisory 
Gontrols;  Guidelines),  Difficulty  (Gcinplexity;  Scope  and  Effect),  Personal 
Relationships  ( lersonal  Contact;  Purpose),  and  Environmental  Demands 
(Physical  Requirements;  Work  Environment). 

Ihe  Anderson  and  Corts  questionnaire  has  one  scale  for  each  subfactor. 

A  number  of  their  scale  points  have  definitions  that  appear  to  include  multiple 
elements.  In  the  present  study  only  one  element  per  scale  was  used,  thereby 
making  it  easier  for  an  untrained  rater  to  comprehend  and  also  providing 
a  check  on  the  internal  consistency  of  each  subfactor.  Another  change  frc*n 
the  Anderson  and  Oorts  questionnaire  was  the  deletion  of  content  inappropriate 
to  clerical  and  administrative  personnel;  e.g..  Personal  Contacts  was  dropped 
because  it  was  judged  to  cover  essentially  the  same  domain  as  Purpose  of 
Personal  Contacts. 


METHODOLOGY 


The  Questionnaire 


The  Job  Analysis  Questionnaire  (JAQ)  was  developed  specifically  for 
this  study.  It  contains  24  pages  divided  into  five  major  parts:  background 
information  (15  items),  knowledges  (31),  skills  and  abilities  (104),  job 
evaluations  scales  (15),  and  activities  (99).  The  original  form  of  the 
JAQ  was  reviewed  by  two  personnel  psychologists  to  determine  its  appro¬ 
priateness  in  terms  of  comprehensiveness  of  coverage  and  understandability 
by  subjects.  It  was  pretested  in  two  agencies:  the  Department  of  Health 
and  the  Personnel  Office.  As  part  of  the  pretest,  subjects  were  instructed 
to  suggest  additional  items.  Cn  the  basis  of  the  pretest  the  JAQ  underwent 
extensive  revision,  especially  with  regard  to  instructions. 

The  JAQ  is  in  Appendix  A  of  Mann  (1980).  Only  the  parts  pertinent 
to  this  report  will  be  described  here.  The  background  section  contains 
questions  used  to  identify  the  sample  or  interpret  the  results,  e.g.,  agency, 
length  of  experience,  level  of  education,  year  of  birth,  sex,  and  class  title. 
The  15  job  evaluation  scales  were  designed  to  measure  8  subfactors:  Supervisory 
Gontrols  (3  scales).  Complexity  (3),  Guidelines  (2),  Scopes  and  Effect  (2), 
Knowledge  and  Skill  Fequired  by  the  Job  (2),  Physical  Requirements  (1),  Work 
Environment  (1),  and  Purpose  of  Personal  Contacts  (1),  Each  scale  had 
from  three  to  five  statements,  incumbents  checked  the  one  statement 
in  each  scale  that  best  described  their  respective  positions. 

Procedure 


The  heads  of  all  V.I.  government  agencies  were  sent  letters  requesting 
their  participation  in  the  study  and  asking  them  to  nane  a  project 
coordinator.  Coordinators  were  asked  to  determine  the  number  of  full-time 
clerical  and  administrative  personnel,  GS-1  through  GS-24,  in  their  agencies. 
Sufficient  jAQs  were  printed  and  delivered  to  each  coordinator,  along  with 
instructions  for  administering.  If  so  desired  the  JAQ  could  be  completed 
anonymously. 

The  popluation  was  limited  to  full-time  clerical  and  administrative 
employees  in  General  Schedule  grades  1  through  24  who  had  been  in  their 
jobs  more  than  three  months ,  and  who  worked  in  one  of  the  11  agencies  that 
participated  in  the  study.  GS-24  was  selected  as  the  top  grade  because 
beyond  this,  to  the  limit  of  GS-30,  are  higher  level  professionals  and 
managers.  The  General  Schedule  was  chosen  because  it  excluded  those  not 
of  interest  in  the  present  study,  such  as  blue-collar  workers,  police 
officers,  firefighters,  nurses,  doctors,  and  teachers. 

Sample 


The  sanple  consisted  of  160  clerical  and  administrative  employees  in 
78  classes.  The  individuals  in  the  sample  had  been  with  the  government 
anywhere  from  3  months  to  31  years  (mean  =  8  1/2  years),  ranged 
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from  18  to  60  years  of  age  (mean  =  36) ,  had  been  in  class  from  3  months 
to  20  years  (median  =  4  years),  ranged  fran  GS-1  to  24  (median  =  15) ,  and 
had  supervisors  who  ranged  from  GS*1 5  to  30  (median  =24).  In  this  sample 
77%  were  female,  96%  had  a  high  school  diploma,  and  13%  had  completed  at 
least  four  years  of  college.  In  addition,  22%  of  the  incumbents  considered 
themselves  journeymen,  8%  master  craftsmen,  18%  lead  workers,  and  25% 
superviors. 

Accurate  population  data  were  not  available.  Therefore,  a  compari¬ 
son  of  sample  and  population  data  was  not  possible.  The  sample  did  not 
appear  to  be  unrepresentative  of  clerical  and  administrative  personnel 
in  VI. 


RESULTS  AND  DISCUSSION 


job  Evaluation  Dimensions 


An  attempt  was  made  to  reduce  the  15  job  evaluation  to  a  few  meaningful 
factors.  Anderson  and  Corts  (1973)  hypothesized  five  such  factors — Knowledge 
Required  by  the  Job,  Responsibility,  Difficulty,  Personal  Relationships, 
and  Environmental  Demands — but  no  data  to  either  confirm  or  deny.  While 
the  present  study  does  not  support  the  hypothesis  of  five  independent 
dimensions,  it  does  suggest  a  reasonable  alternative. 

The  scores  on  the  15  job  evaluation  scales  were  intercorrelated  (see 
Table  1).  Fran  this  intercorrelation  matrix,  three  principal  component 
factors  were  extracted  and  rotated  orthogonally.  These  three  factors 
accounted  for  55%  of  the  common  variance.  The  first  factor,  accounting 
for  27%  of  the  cannon  variance,  had  nine  scales  that  loaded  ,5  or  higher: 
Supervisory  Controls  (3),  Guidelines  (2),  Scope  and  Effect  (2),  Purpose 
of  Personal  Contacts,  and  Knowledge  Required  by  the  Job  (see  Table  2). 

This  factor  was  called  Autonomy.  CO  this  first  factor  collasped  the 
following  Anderson  and  Corts  factors:  Responsibility,  Knowledge 
Required  by  the  job,  personal  Contacts,  and  half  ( Scope  and  Effect 
subfactor)  of  the  Difficulty  factor. 


TABLE  2 


Factor  loadings  of  15  Job  Evaluation  Scales  (N  =  160) 


job  Evaluation  Scale 

Factor  loading 

I 

II 

Ill 

Supervisory  Controls  1 

69 

"  "2 

78 

t*  H  3 

58 

Guidelines  1 

62 

"  2 

69 

Scores  and  Effect  1 

55 

49 

"  "2 

51 

33 

Personal  Contacts 

50 

32 

Knowledge  Required  by  the  Job 

59 

30 

Skill  Required  by  the  Job 

43 

-57 

Complexity  1 

47 

60 

"  2 

85 

n  3 

30 

80 

Physical  Requirements 

81 

Wbrk  Environment 

65 

Note:  loadings  less  than  .30  deleted. 

Decimals  points 

deleted. 
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The  second  factor  accounted  for  16%  of  the  common  variance.  Having 
only  the  Complexity  scales  loading  .6  or  higher,  it  was  called  Ca  plexity. 

In  the  Anderson  and  Corts  hypothetical  framework,  Complexity  is  only  a 
subfactor,  one  half  of  the  Difficulty  factor. 

The  third  factor  accounted  for  12%  of  the  common  variance.  It  had 
three  scales  loading  .5  or  higher;  Physical  Requirements,  Wbrk  Environment, 
and  Skill  Required  by  the  Job  (negative  loading).  The  first  two  scales 
comprise  the  Anderson  and  Oorts  factor  called  Environmental  Demands,  and 
therefore  was  given  that  nane.  The  Skill  Required  by  the  Job  scale  was 
part  of  the  Anderson  and  Corts  Knowledge  Required  by  the  Job  factor,  but 
had  to  be  removed  from  that  factor  because  for  clerical  and  administrative 
personnel  the  two  differ. 

Prediction  of  Grade 


Step-wise  multiple  regression  was  used  to  estimate  the  extent  that 
grade  could  be  predicted  fran  the  data  at  hand.  Factor  scores  for  the  three 
job  evaluation  factors  were  computed  from  the  factor  loadings  and  standardized 
scores  of  the  15  scales.  Cnly  the  first  two  sets  of  factor  scores  were 
accepted  by  the  step-wise  multiple  regression.  The  first  set  of  job 
evaluation  factor  scores  (Autonomy)  correlated  .55  with  grade;  the  second 
(Complexity),  .36.  Together  they  correlated  .65  with  grade  (significant  at 
the  .001  level). 


Mann  (1980)  found  considerable  error  variance  in  the  grades  of  individual 
v.  I.  positions;  based  on  an  analysis  of  time  spent  on  work  activities,  31% 
of  the  positions  were  found  to  be  misclassif led.  A  more  reliable  measure  of 
grade  would  theoretically  correlate  higher  with  the  job  evaluation  factor 
scores  than  .65.  TO  test  this  hypothesis,  predictor  scores  for  positions 
were  grouped  by  grade;  median  predictor  scores  correlated  .92  with  actual 
grade. 
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RECOMMENDATIONS 


Despite  the  promising  results,  the  job  evaluation  questionnaire  cannot  be 
reccmnended  for  operational  use  until  additional  research  has  been  oaipleted. 

The  present  study  was  limited  by  a  snail  sample  size.  For  those  interested  in 
using  the  questionnaire  the  following  recommendations  are  made: 

(1)  Consider  the  use  of  supervisors  as  subjects;  they  would  have  been 
used  in  the  present  study  if  it  had  been  possible. 

(2)  Scale  the  15  items  in  terras  of  the  organization  in  which  they  are  to  be 
operationally  used.  Edit  as  appropriate. 

(3)  Don't  apply  the  questionnaire  to  supervisory  positions  unless  one  or 
more  scales  on  supervising  others  are  developed. 

(4)  Ferform  the  factor  and  regression  analyses  with  a  fairly  large  sample; 
1000  would  not  be  too  many. 
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REDUCING  ADVERSE  IMPACT  VIA  A  MEASURE  OF 
APPLICANT  DISADVAOTAGHWESS 

Walter  G.  Mann* 

Personnel  Research  and  Development  Center 
U.S.  Office  of  Personnel  ►tenagement 
Washington,  D.C.  20415 


SUMMARY 

Recent  attempts  to  reduce  the  adverse  inpact  of  examinations  have 
focused  on  altermtives  to  written  tests.  The  present  report,  on  the 
other  hand,  demonstrates  how  the  adverse  inpact  of  written  tests  can  be 
reduced  by  correcting  for  the  degree  to  vtoich  a  job  applicant  had  been 
educationally  and/or  eocnanically  disadvantaged  or  deprived. 

A  measure  of  disadvantagedness  containing  nine  items  was  internally 
and  externally  validated  with  11,931  applicants  for  a  nationwide,  examina¬ 
tion.  Internal  validation  was  demonstrated  by  a  factor  analysis  that 
yielded  two  factors.  External  validation  was  based  on  the  relationships 
of  scores  on  the  two  factors  with  other  variables:  minority  status,  test 
performance,  and  educational  level;  each  assured  relationship  was  confirmed. 

A  composite  measure  of  the  two  factor  scores,  called  D,  had  a  point 
biserial  correlation  of  -.31  with  passing  the  test  and  .54  with  minority 
status.  The  adverse  inpact  of  the  test  was  substantially  reduced  by  par- 
tialling  D  out  of  test  performance.  It  was  also  demonstrated  that  par¬ 
tial  ling  D  out  of  test  performance  would  not  necessarily  reduce  the  validity 
of  the  test,  and  could  actually  inprove  it.  \ 


INTRQDUCTICN 

It  has  proven  difficult  to  design  valid  selection  procedures  that  have  no 
adverse  impact.  Some  employers  believe  the  solution  is  to  avoid  the  use 
of  written  tests.  This  approach,  however,  will  fail  to  the  extent  that  the 
alternative  selection  device  has  adverse  impact,  lacks  validity,  or  is  not 
cost  effective. 

The  approach  adopted  in  the  present  paper  is  not  to  eliminate  the  entire 
test — but  to  statistically  partial  out  that  part  of  it  that  is  undesirable. 

If  whites  as  a  group  do  better  on  a  test  because  they  have  had  more  advantages 
than  minorities,  then  it  is  a  simple  ratter  to  reduce  the  adverse  inpact  of 
the  test  by  statistically  controlling  for  disadvantagedness.  A  measure  of 
disadvantagedness  is,  of  course,  needed  to  make  this  correction. 


*The  statements  expressed  in  this  papers  are  those  of  the  author  and  do 
not  necessarily  reflect  official  policies  or  opinions.  Assisting  at  various 
stages  of  the  project  were  John  Kraft,  Lois  Northrop,  Henrietta  McClure, 
Anthony  Mento,  and  Robert  McKenzie. 
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The  present  paper  addresses  such  question  as: 

-  Can  a  measure  of  disadvantagedness  be  developed  that  is  appro¬ 
priate  for  applicants  for  a  particular  examination?  Such  appli¬ 
cants  would,  of  course,  be  more  homogeneous  than  the  population  at 
large. 

-  Can  the  adverse  inpact  of  a  specific  written  test  be  reduced 
when  the  educational  and  eocranic  disadvantagedness  of 
applicants  is  partialled  out? 

-  Must  a  test  that  does  not  handicap  disadvantaged  applicants  be 
less  valid  than  its  conventional  counterpart? 


It  was  assumed  (e.g.,  Bachman,  1972;  Tyler,  1965)  that  disadvantaged¬ 
ness  while  one  is  growing  up  is  related  to  the  following: 

-  father' s  education  and  job  ( negative  relationship) ; 

-  mother's  education  and  job  (negative  relationship); 

-  lack  of  familiarity  with  the  English  language; 

-  type  of  oaifnunity  one  grew  up  in  (e.g.,  fewer  advantages 
in  the  big  city  than  in  the  suburbs); 

-  ethnic  group  (minorities  more  disadvantaged  than  vhites — but 
intragroup  differences) ; 

-  test  performance  as  an  adult  (negative  relationship); 

-  highest  level  of  education  (negative  relationship). 


ME1H» 

Procedure 

Three  self-administering  questionnaires — CSC  EOrm  1289-A  (Temp) , 

CSC  Form  1310  (Temp) ,  and  CSC  Form  1203-N — were  given  to  all  applicants 
nationwide  who  took  a  specific  written  test  during  1978.  Conpleted 
forms  were  optically  scanned  and  the  data  recorded  on  acnputer  tape. 

The  author  was  assigned  to  the  project  after  all  the  data  had  been 
collected. 

CSC  Form  1289-A  (Tertp)  has  seven  questions:  name,  date,  social  se¬ 
curity  number,  examining  office  code,  race,  sex,  and  ethnicity. 

Non-Hi spani c  whites  were  the  majority  group;  all  other  ethnic  groups  were 
combined  to  form  the  minority  group. 

CSC  Form  1310  (Tenp)  has  16  questions:  name,  date,  year  of  birth, 
social  security  number,  examining  office  code,  type  of  mathematics 
courses  taker.,  questions  about  conditions  while  they  were  growing  up 
(type  of  catirunity,  whether  English  was  the  primary  language  spoken 
in  heme  or  school,  and  the  perception  of  economic  and  educational  advantage 
in  the  lieme) ,  the  irost  important  occupation  and  the  highest  level  of 
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education  of  parents  (or  guardians),  vAiether  they  or  members  of  their  family 
are  currently  enplcyed  by  the  Federal  government,  and  whether  they  claim 
veteran  preference. 

CSC  Form  120 3-N  has  10  questions:  name,  date,  social  security 
nunber,  data  available  to  start  work,  occupational  specialty  (applicants 
for  another  job  took  the  same  written  test) ,  regional  preference,  Spanish 
language  ability,  educational  level,  length  of  qualifying  experience,  and 
previous  experience. 

All  the  data  generated  by  the  first  two  forms  were  available  to  this 
investigator,  with  the  exception  of  type  of  math  courses  taken.  Data 
from  the  last  form  were  available  cnly  for  those  applicants  who  passed 
the  written  test. 

The  written  test  consists  of  65  items  of  the  following  types:  30 
vocabulary,  15  English  usage,  10  judgment,  and  10  logical  order  of  events. 

An  applicant's  score  is  the  nimber  of  right  answers;  35  is  passing.  Scares 
between  35  and  65  are  transnuted  on  a  point-by-point  basis  to  ratings. 

Sanple 

Data  were  obtained  from  11,931  applicants .  it  was  impossible  to 
determine  what  percent  this  was  of  the  total  number  of  applicants ;  i.e. , 
it  was  inpossible  to  determine  the  return  rate.  However,  Northrop  (Note 
1)  found  a  98%  return  rate  for  another  nationwide  examination. 

The  mean  year  of  birth  was  1951.8  (S.D.  =  4.2).  Females  accounted 
for  9.2%  of  the  sanple.  Included  in  the  sanple  were  6821  whites,  3699 
Hispanics,  and  963  blacks. 


Sea] ing  the  Items  Used  to  Measure  Disadvantagedness 

The  nine  items  used  to  measure  disadvantagedness  are  from  CSC  FOrm  13  ID 
(Temp)  Nov.  1977.  Although  this  form  was  developed  by  personnel  psychologists 
with  the  intention  of  measuring  socio-economic  level,  the  items  appear  to  be 
acnpatible  with  the  measurement  of  disadvarrtagedness  *  it  was  cnly  necessary 
to  score  than  in  the  opposite  direction  frem  that  originally  intended.  In 
fact,  two  of  the  items  (concerning  perceptions  of  eocnanic  and  educational 
advantage)  appear  to  be  more  closely  related  to  disadvantagedness  than 
to  socio-eccnanic  level. 

Scale  values  for  these  it  arts  were  computed  on  half  the  sanple  (SI). 

Each  alternative  was  given  a  value  from  0  to  13  (least  to  greatest  disadvan¬ 
tagedness,  based  on  the  judgment  of  the  researcher). 
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A  simple  sun  score  (called  initial  D)  of  the  nine  items  was  computed  for 
each  individual  in  SI.  The  mean  initial  D  for  all  individuals  who 
answered  an  iten  in  any  particular  way  was  cornputed.  For  example,  the 
mean  initial  D  of  all  those  applicants  who  indicated  they  grew  up  on  a 
farm  was  ocnputed;  likewise,  the  mean  initial  D  for  those  who  grew  up  in 
a  small  town;  etc.  For  each  item,  the  alternative  with  the  lowest  mean 
initial  D  was  arbitrarily  set  to  0;  and  the  alternative  with  the  largest 
mean  initial  D,  20.  Alternatives  with  intermediate  initial  D‘s  were 
assigned  whole  numbers  between  0  and  20  depending  on  their  relative 
distances  between  the  high  and  low  points. 


VALIDATION  OF  THE  MEASURE  OF  DISAEVANTAffiDNESS 
Internal  Validation  of  Disadvantagedness  I tans 

Factor  analysis  was  applied  to  the  second  half  of  the  sample  (S2)  to 
develop  evidence  concerning  the  internal  validity  of  the  nine  disadvantaged¬ 
ness  items.  Response  to  the  items  were  assigned  the  scale  values  generated 
on  SI.  An  iterative  factor  analysis  of  the  inter  correlations  between  the 
items  produced  two  significant  roots.  The  program  iterated  until  the  nurfoer 
of  significant  roots  was  decided  and  the  diagonal  estimates  stabilized. 
Variirex  rotation  was  used  to  simplify  the  factor  structure.  The  first 
factor  accounted  for  29%  of  the  variance;  the  second  factor,  11%. 

Seven  of  the  nine  items,  the  highest  of  which  were  mother's  education 
(.77)  and  father's  education  (.76),  loaded  at  least  .42  on  the  first 
factor  (see  Table  1).  This  factor  was  named  General  Disadvantagedness. 

•Type  of  eormonity  (.98)  was  the  only  item  that  loaded  higher  than  .08  cn 
the  second  factor.  This  factor  was  named  Corimunity  Disadvantagedness. 


TABLE  1 

Loadings  of  Two  Factor  Extracted  from  Interoorre latrons 
Between  Pairs  of  Disadvantagedness  Items  (N  =  5965) 


Factor 


Questionnaire  Iten 

I 

II 

Mother's  Education 

.77 

.06 

Father's  Education 

.76 

.06 

Father's  Job 

.69 

.06 

Spoken  Language 

.52 

.08 

Educational  Advantage 

.51 

.05 

Mother's  Job 

.51 

.00 

Economic  M vantage 

.42 

.06 

School  language 

.16 

.03 

Type  of  Gorrmunity 

.  10 

.98 

External  Validation 


An  external  check  on  whether  a  factor  is  measuring  disadvantagedness 
is  to  determine  its  relationship  with  other  variables.  Factor  scores  based 
cn  standardized  scores  and  factor  loadings  were  correlated  with  test  perform¬ 
ance,  minority  status,  level  of  education,  sex  and  year  of  birth.  Prom  the 
correlations  in  Table  2,  it  is  clear  that  high  Factor  I  scores  were  associated 
with  low  test  performance,  being  a  mavber  of  a  minority  group,  low  educational 
level,  and  being  old. 


TABLE  2 

Correlations  of  Two  Disadvantagedness  Factor  Scores 
With  Selected  Dichotomized  variables 


Selected 

Variable 

Test  Performance 
(pass  =  1,  fail  =0) 

Minority  Status 
(minority  =  1,  white  =  0} 

Educational  Level 

Sex  (male  =  1,  female  =  2) 

Year  of  Birth 

*£  <  -01 


Factor 


I 

II 

N 

-.30* 

-.09* 

5965 

.52* 

.12* 

5877 

-.11* 

-.05* 

2965 

.00 

.00 

5900 

-.15* 

l 

• 

O 

* 

5948 

Factor  II  scores  had  the  same  correlational  profile  as  Factor  I  scores , 
the  only  difference  being  the  magnitude  of  the  correlations.  Both  factors 
therefore  exhibited  the  correlational  profile  that  had  been  assured  to  exist 
for  disadvantagedness.  Lhe  two  sets  of  factor  scores  correlated  -.022. 

Neither  set  of  factor  scores  had  a  sex  bias;  however,  both  demonstrated  a 
bias  towards  younger  applicants.  An  overall  measure  of  disadvantagedness, 
based  on  the  two  sets  of  factor  scores  was  effected  by  a  conventional  multiple 
regression  paradigm  utilizing  minority  status  as  the  dependent  variable;  the 
weights  came  from  SI.  lhe  combination  of  the  two  factor  scores  will  be 
referred  to  as  D.  D  correlated  -.31  with  test  performance  and  .54  with 
minority  status. 
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ADVERSE  IMPACT  AND  VALIDITY  OF  TOE  WRITTQ'J  TEST 


Use  of  D  to  Reduce  Adverse  Impact 

The  correlation  of  test  performance  with  minority  status  is  an  easily 
ecrputed,  albeit  crude,  index  of  the  adverse  irrpact  of  a  test.  The  four¬ 
fold  point  correlation  between  test  performance  (pass/fail)  and  minority 
status  (minority/majority)  was  -.419  (significant  at  the  .01  level). 

D  was  partialled  out  of  the  test  scores,  producing  in  effect  D-less 
test  scores.  The  adverse  inpact  of  these  D-less  test  scores  was  estimated 
by  correlating  than  with  minority  status  <r  =  -.335).  Thus,  controlling 
for  D  reduced  the  adverse  inpact  of  passing  the  test  substantially — but 
hardly  ocrpletely. 

Neither  the  conventional  test  nor  the  D-less  test  demonstrated  a  signifi¬ 
cant  sex  bias.  In  fact,  the  correlation  of  test  performance  with  sex  was 
virtually  zero. 

Estimated  Validity  of  D-Less  Test  Scores 

Although  the  criterion-related  validity  of  D-less  test  scores  has  not 
yet  been  determined,  estimates  of  it  can  be  generated  based  on  assumptions 
about  correlations  of  a  criterion  measure  with  (1)  the  conventional  test  and 
(2)  D.  It  was  assumed  that  the  criterion  correlation  for  the  conventional 
test  was  between  .2  and  .4,  and  for  D  between  .0  and  -.2.  Partialling  D  out 
of  the  conventional  test  scores  and  correlating  the  result  with  the  criterion, 
produces  the  validity  coefficients  in  Table  3.  As  the  validity  of  ‘die  conven¬ 
tional  tost  increases,  so  does  the  validity  of  the  D-less  test.  In  no  case 
does  the  D-less  test  lose  all  its  validity,  not  even  when  the  conventional 
test  validity  Ls  .20  and  D  correlates  -.20  with  the  criterion.  When  D  corre¬ 
lates  .00  with  the  criterion,  the  validity  of  the  D-less  test  actually  exceeds 
the  validity  of  the  conventional  test. 
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TABLE  3 


Validity  Coefficients  of  D-Less  Test  as  a  Function  of 
Conventional  Test  Validity  and  Correlation  of  D  with  the  Criterion 

Possible  Validity  of  Conventional  Test 


Possible  Correlation 
of  D  with  Criterion 

.20 

.30 

.40 

.00 

.22 

.33 

.45 

-.10 

.19 

.30 

.41 

-.20 

.15 

.26 

.35 

Note.  Validity  Coefficients  are  product  moment  correlations. 


DISCUSSION  AND  CONCLUSIONS 

The  measurement  of  disadvantagedness  in  the  present  study  was  based 
on  nine  items.  Although  these  items  appeared  to  be  effective  indicators  of 
disadvantagedness,  they  have  limitations  fran  a  psychological  point  of 
view  in  that  they  are  global  or  outputs,  rather  than  processes  or  behaviors. 

For  exanple,  the  type  of  ocmmunity  one  grew  up  in  is  a  very  crude  overall 
indicator  of  the  presence  of:  libraries,  museuns,  role  models,  reinforcement 
of  intellectual  behavior,  and  positive  peer  attitudes  toward  education. 

A  better  measure  of  disadvantagedness  would  rely  on  specific,  psychologically 
rich  indicators. 

Because  of  the  limitations  of  the  items  used  to  measure  D  and  because 
of  the  limited  sairple  (applicants  for  a  single  examination  during  1978), 
generalization  to  other  situations  is  dangerous.  Future  studies  of  other 
classes  could  provide  the  type  of  information  needed  to  derive  generalizations 
about  the  measurement  of  disadvantagedness  and  its  relationship  with  constructs 
in  the  occupational  setting. 

The  measure  of  D  was  based  on  research  items.  Before  a  measure  of  D 
could  be  used  operationally,  the  items  would  have  to  be  reprinted  as  part  of 
the  examination.  The  possibility  that  this  could  affect  the  candor  of  the 
applicants  should  be  studied. 

If  D  has  lew  or  negative  validity  the  D-less  test  will  have  approximately 
the  same  validity  as  the  conventional  test. 
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Development  and  Normlng  of  Reading  Tests  for  Air  Force  Use 

i 


This  paper  reviews  development  and  normlng  of  reading  tests  tar¬ 
geted  at  Air  Force  personnel.  A  number  of  different  reading  tests  are 
In  current  Air  Force  use.  These  vary  in  intended  target  population 
from  grades  4-6  (Gates-MacGlnitle)  to  high  school  through  college 
(Nelson-Denny) .  Such  divergent  level  tests  have  yielded  different 
normative  scores  for  the  same  individuals  (Mathews,  Valentine,  and 
Sellman,  1978).  The  purpose  of  reading  tests  developed  by  AFHRL  Is  to 
provide  standard  tests  targeted  at  and  normed  on  Air  Force  personnel. 
Also,  costs  for  test  materials  should  be  substantially  less.  Two 
equivalent  Air  Force  Reading  Abilities  Test  (AFRAT  2a  and  2b)  Forms 
have  been  developed  for  remedial  reading  programs.  A  broader  level 
test  (AFRAT  1)  was  used  to  obtain  normative  and  validity  data  on 
reading  items  for  5,000  enlistees.  Data  reported  include  item  and 
test  statistics,  relationships  and  teBt  and  demographic  variables,  and 
calibration  with  other  reading  testa. ,a 
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DEVELOPMENT  AND  CALIBRATION  OF  ENLISTMENT  SCREENING 
TEST  (EST)  FORMS  Sla  AND  81b 

John  J.  Mathews  and  Dr.  Malcolm  James  Ree 
Air  Force  Human  Resources  Laboratory 


I.  INTRODUCTION 

The  Enlistment  Screening  Test  (EST)  was  developed  to  reduce 
enlistment  processing  costs  for  transportation  and  boarding.  By 
administering  this  test  at  local  recruiting  stations,  those  applicants 
who  would  most  likely  meet  service  mental  qualification  standards 
could  be  identified  and  sent  to  centralized  testing  stations. 

Previous  EST  Forms  5  and  6  (Jensen  &  Valentine,  1976)  were 
designed  to  predict  Air  Force  qualification  on  the  Armed  Forces 
Qualification  Test  (AFQT)  portion  of  the  Armed  Services  Vocational 
Aptitude  Battery  ( ASVAB) .  These  ESTs  became  obsolete  with  the 
implementation  of  ASVAB  Forms  8,  9,  and  10  which  do  not  have  Space 
Perception  items  as  did  prior  forms. 

The  objective  of  this  study  was  to  develop  and  norm  two  parallel 
EST  forms  for  use  by  military  recruiters  in  predicting  applicant 
success  on  ASVAB  8,  9,  and  10  selection  composites.  The  new  ESTs  were 
designed  in  accordance  with  specifications  which  would  make  them 
appropriate  for  use  by  all  armed  services. 


II.  METHOD 


Subjects 

Service  applicants  were  sequentially  administered  one  of  three 
experimental  test  booklets  during  March  and  April  1981  at 
geographically  dispersed  samples  of  recruiting  stations  representing 
each  service.  A  total  of  1,882  subjects  had  sufficient  data  forwarded 
in  time  to  be  included  in  the  various  analyses.  The  following 
subsatnples  were  formed  for  specific  analyses: 


Sample  1. 

Subjects 

given 

booklet  AX 

(N  =  527) 

and 

for 

whom 

answer 

sheets  were 

received 

by  24 

April  1981. 

Sample  2. 

Sub jec  t  s 

given 

booklet  BY 

(N  =  486) 

and 

for 

whom 

answer 

sheets  were 

received 

by  24 

April  1981. 

Semple  3. 

Subjects 

given 

booklet  CZ 

(N  =  457) 

and 

for 

whom 

answer 

sheets  were 

received 

by  24 

April  1981. 

778 


Sample  4.  Subjects  given  booklet  BY  (N  =  898)  and  for  whom 
answer  sheets  were  received  by  14  May  1981.  Sample  2  is  a  subset  of 
Sample  4. 

Instruments 

In  order  to  develop  two  parallel  forms  of  an  EST,  each  of  which 
would  require  no  more  than  45  minutes  to  administer,  three 
experimental  booklets  (AX,  BY,  and  CZ)  containing  60  items  each  and 
requiring  1  hour  of  administration  time  were  assembled.  Eight  items 
were  common  to  all  booklets  as  a  check  on  the  comparability  of  samples 
given  the  different  booklets.  Each  booklet  contains  28  Word  Knowledge 
(WK) ,  22  Arithmetic  Reasoning  (AR),  and  10  Paragraph  Comprehension 
(PC)  items.  These  items  were  selected  based  on  available  data  which 
indicated  that  they  would  discriminate  best  in  the  range  of  the  5th  to 
55th  percentile  for  samples  stratified  on  AFQT  scores.  The  goal  for 
EST  items  was  to  have  about  402  with  maximum  discrimination  around  the 
15th  percentile,  40%  with  maximum  discrimination  around  the  30th 
percentile,  and  20%  with  maximum  discrimination  around  the  45th 
percentile  in  the  AFQT  target  population.  This  distribution  maximized 
measurement  at  ability  levels  where  most  selection  decisions  are 
made.  Items  from  the  three  experimental  booklets  were  selected  for 
two  operational  forms,  each  containing  40  to  50  items. 

Data  Editing 


The  following  steps  were  taken  to  insure  that  only  valid  data  were 
included  in  analyses: 

1.  Cases  with  no  social  security  account  number  nor  name  were 

deleted. 

2.  Correct  designation  of  booklet  form  was  checked  by  scoring 
the  test  separately  using  the  answer  key  for  all  three  booklets.  If 
an  appreciably  higher  score  would  have  been  obtained  based  on  the  key 
for  another  form,  then  the  booklet  designation  was  changed.  If  the 
highest  form  score  was  below  8  (out  of  60),  indicating  that  the 
examinee  was  not  trying  on  the  test,  the  case  was  deleted. 

3.  Cases  with  no  AFQT  scores  were  deleted  from  all  analyses 
except  preliminary  item  analyses. 

4.  Cases  with  a  standard  error  of  estimate  greater  than 
-4-  3.5  points  from  the  predicted  AFQT  based  on  EST  scores  were  deleted. 

Less  than  2%  of  the  cases  were  eliminated  from  all  analyses  based  on 
these  editing  procedures. 
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Data  Analysis 


Item  statistics  were  generated  to  aid  item  selection  for 
operational  forms.  Descriptive  statistics  including  correlations  and 
bivariate  frequency  distributions  of  EST  with  ASVAB  selection 
composites  were  computed.  The  EST  was  calibrated  to  AFQT  through 
equipercentile  equating. 


III.  RESULTS  AND  DISCUSSION 
Analyses  on  Stratified  Samples 

Samples  1  through  3  were  rectilinearly  stratified  on  AFQT 
percentile  by  random  duplication  of  subjects  so  that  an  equal  portion 
(102)  of  the  sample  was  represented  in  each  AFQT  decile.  This 
procedure  equated  the  samples  on  ability.  Classical  item  analysis, 
including  item  difficulty  and  discrimination  with  AFQT  as  an  external 
criterion,  was  then  accomplished  for  each  experimental  fort.. 

Items  were  selected  to  yield  two  parallel  EST  forms.  All  the 
items  for  one  operational  form  (EST  81a)  were  chosen  from  the  booklet 
(BY)  which  contained  the  greatest  number  of  items  in  the  appropriate 
difficulty  and  discrimination  ranges.  This  procedure  subsequently 
allowed  direct  generation  of  EST  total  score  frequency  distribution 
and  bivariate  statistics  involving  EST  81a  and  AFQT,  since  subjects  in 
sample  4  were  given  all  items  which  would  be  in  that  operational  FST 
form. 

Considerations  in  selecting  items  for  the  two  forms  included: 

1.  A  significant  positive  correlation  with  AFQT. 

2.  Maximum  discrimination  in  the  desired  ability  range  (5th 
to  55th  percentile) 

3.  Content  similar  to  AFQT  (proportion  of  Verbal  and 
Arithmetic  Reasoning  items). 

4.  Necessity  of  having  the  two  forms  parallel. 

Two  EST  forms  of  48  items  each  which  met  the  desired 
specifications  were  constructed.  A  comparison  of  the  content  of  these 
forms,  designated  EST  81a  and  81b,  and  AFQT  is  given  in  Table  1. 
Because  of  administration  difficulties,  it  was  decided  to  exclude  the 
speeded  Numerical  Operations  (NO)  item  types  from  the  EST.  After 
deleting  NO,  AR  items  equally  comprise  37.52  of  AFQT  and  EST  content. 
There  are  relatively  more  WK  and  fewer  PC  items  in  EST  than  in  AFQT. 
However,  these  two  verbal  item  types  are  highly  intercorrelated. 
Because  WK  items  require  less  time  to  complete,  an  abundance  of  these 
items  saves  testing  time. 


Difficulty  (p)  levels  of  the  two  forms  based  on  stratified  samples 
are  indicated  in  Table  2.  Item  ja's  range  from  .57  to  .69.  Mean  j>'s 
for  the  ESTs  are  virtually  identical  (.744  and  .742),  and  the 
distributions  are  similar. 

Biserial  correlations  (validity  estimates)  of  items  with  AFQT 
percentiles  ranged  from  .29  to  .63,  with  about  75%  between  .45  and  .59 
(see  Table  3). 

Again,  the  forms  appear  quite  comparable,  with  similar  biserial  r 
means  (.476  and  .496)  and  distributions.  An  internal  consistency 
reliability  (KR-20)  of  .93  was  obtained  for  EST  81a.  Since  no 
subjects  were  administered  all  items  in  EST  81b,  test  statistics, 
including  reliability,  were  not  computed  for  this  form. 

Item  Response  Theory  (IRT)  item  analytic  indexes  (Lord  &  Novick, 
1968)  were  also  computed  for  EST  81a  based  on  the  Birnbaum  (1968) 
three-parameter  logistic  model.  The  three  indexes  are:  a  (item 

discrimination),  t>  (item  difficulty  in  Z  metric),  and  c  (probability 
of  guessing)  (see  Ree,  1979,  for  a  detailed  description  of  these  item 
parameters).  Two  types  of  analyses  were  completed.  The  first  was 
based  on  EST  score  as  an  internal  criterion,  and  the  second  was  based 
on  AFQT  percentiles  transformed  into  Z  scores  for  the  sample.  Table  4 
presents  the  mean  a,  b,  and  c  values  for  these  analyses  on  EST  81a. 

As  would  be  expected,  the  mean  a  (T)  was  somewhat  higher  in  the 
internal  compared  to  external  criterion  analysis,  although  both  H' s 
were  relatively  high  (1.3  and  1.1,  respectively).  Both  b's  were  -.£2, 
corresponding  to  a  mean  percentile  of  27.  The  mean  of  the  desired 
(targeted)  distribution  of  b  would  be  about  -.67.  The  £  was  .20 
based  on  the  internal  analysis.  The  classical  probability  of  guessing 
for  a  four-choice  item  is  .25.  The  ~c  based  on  AFQT  ability  estimates 
was  somewhat  higher,  .32. 

Table  5  gives  the  mean  and  standard  deviation  (SD)  of  EST  81a 
based  on  sample  2  stratified  on  AFQT.  Summary  statistics  for  EST  81b 
would  be  quite  comparable,  since  the  available  data  indicate  it  is 
indeed  parallel  to  EST  8la. 

Intercorrelations  are  also  shown  in  Table  5.  The  Pearson 
product-moment  correlation  (r)  of  EST  81a  with  AFQT  percentile  is 
.83.  This  indicates  that  about  69%  of  she  variance  in  AFQT  scores  can 
be  predicted  from  EST  scores.  The  two  EST  subscales  correlate  with 
EST  total  to  about  the  same  degree  as  the  like-named  ASVAB  composites 
correlate  with  AFQT.  The  £ls  or  the  Verbal  (VE7  and  Arithmetic 
Reasoning  (AR)  scales  with  EST  total  are  .95  and  .88,  respectively. 
The  corresponding  _r’s  of  ASVAB  8,  9,  and  10  VE  and  AR  with  ASVAB  8,  9, 
and  10  AFQT  are  .93  and  .8‘',  respectively  (Ree,  Mathews,  Mullins,  & 
Massey ,  1981). 
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Distributional  Statistics 

An  equipercentile  calibration  was  accomplished  for  the  purpose  of 
converting  EST  scores  to  equivalent  AFQT  percentiles.  Table  6  gives 
the  computed  percentiles  for  sample  4  (N  =  869)  on  EST  81a  and  AFQT 
and  the  AFQT  percentile  (from  ASVAB  conversion  tables)  which  is 
equivalent  to  the  computed  percentile  for  each  EST  score. 

The  EST  score  distribution  is  negatively  skewed,  indicating 
potential  for  good  discrimination  at  lower  ability  levels,  which  was 
one  of  the  test  construction  goals.  The  median  EST  score  is  40  (out 
of  48  points)  and  this  is  equivalent  to  an  AFQT  percentile  score  of  50. 

The  percentages  of  subjects  within  various  AFQT  score  categories 
is  shown  for  EST  scores  (grouped  in  intervals  to  increase  Ns)  in 
Table  7.  The  AFQT  categories  represent  various  service  cutoff  score 
boundaries. 

Relationship  of  EST  to  General  Composites 

The  correlation  between  the  ES'1  and  the  General-Technical  (General 
for  Air  Force)  composite  for  subjects  given  booklet  BY  in  March  1981 
(N  =  270)  was  .86.  This  strong  relationship  was  expected  due  to 
similarity  in  content.  Since  percentile  equivalents  for  this 
composite  are  based  on  the  same  reference  test  used  in  norming  AFQT 
and  since  it  correlates  highly  with  AFQT  (£  =  .97),  the  equipercentile 
calibration  (Table  6)  would  also  apply  to  General  (GT)  percentiles. 

IV.  CONCLUSIONS 

Two  parallel  forms  of  an  EST  have  been  developed  and  calibrated  to 
the  AFQT  selection  composite  from  ASVAB  8,  9,  and  10.  These  tests 
appear  to  meet  administrative  and  psychometric  specifications 
adequately.  All  items  in  both  forms  correlate  positively  with  total 
test  score  and  AFQT  scores  and  are  in  an  appropriate  range  of 
difficulty  (from  average  to  very  easy)  for  use  in  prescreening  service 
applicants. 

The  EST  should  be  a  highly  reliable  instrument  (internal 
consistency  coefficient  =  .93).  The  test  items  appear  to  discriminate 
well  throughout  a  range  which  includes  major  service  selection  cutoff 
points  (AFQT  percentiles  15  to  45).  The  two  EST  forms  appear  parallel 
based  on  highly  similar  distributions  of  item  difficulty  and  criterion 
correlation  values.  EST  scores  predict  AFQT  percentiles  quite  well 
(r  =  .83).  In  addition,  EST  content  is  similar  to  that  of  AFQT. 

Interpretation  of  EST  scores  is  provided  by  an  equipercentile 
calibration  to  AFQT  (Table  6).  This  calibration  also  applies  to 
prediction  of  GT  percentile  scores. 
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Table  1.  Content  of  AFQT  and  EST  81a  -  81b 


Item  Types 

Number  of 
AFQT 

Items 

EST 

Arithmetic  Reasoning  (AR) 

30 

18 

Word  Knowledge  (WK) 

35 

23 

Paragraph  Comprehension  (PC) 

15 

7 

(Verbal  =  WK  +  PC) 

(50) 

(30) 

Numerical  Operations  (NO) 

50* 

- 

Total  #  of  items 

130 

48 

♦Speeded  test  with  50  items.  HO  raw  scores  are 
weighted  .5  in  the  AFQT  composite. 


Table  2.  Distributions  of  Item  Difficulties  for  EST  81a  and  81b 


Difficulty  (p) 

81a  (Booklet  BY) 

81b  (Booklets  AX  &  CZ) 

N 

X 

N 

X 

.80  -  .89 

12 

25 

12 

25 

.70  -  .79 

23 

48 

19 

40 

.60  -  .69 

12 

25 

16 

33 

.00  -  .59 

1 

2 

1 

2 

Mean  £ 

.744 

.742 

784 


Table  3.  Distributions  of  EST  Item  Validities  with  AFQT 


sr 


Validity  (£bis^ 

81a  (Booklet  BY) 

N  X 

81b  (Booklets 
N 

AX  &  CZ) 

X 

.60  -  .99 

2 

4 

3 

6 

.45  -  .59 

35 

73 

35 

73 

.30  -  .44 

11 

23 

11 

23 

.00  -  .29 

1 

2 

0 

0 

Mean  £b  iserial 

*  .476 

.496 

*Based  on  r  to 

Z  transformations. 

Table  4.  Means  of 

IRT  Item 

Parameters 

for  EST  81a 

a 

b 

c 

Internal  Criterion 

1.33 

-.62 

.20 

AFQT  Criterion 

1.10 

-.62 

.32 

Table  5.  Descriptive  Statistics  and  Correlations  for  EST  8la 
in  an  AFQT-Stratif ied  Sample  (N  *  445) 


Scale 

Mean 

Standard 
Deviat ion 

Intercorrelations 

VE  AR  EST  AFQT 

Verbal  Ability  (VE) 

22.7 

7.4 

1.00  .69 

.95 

.77 

Arithmetic  Reasoning  (AR) 

13.0 

4.7 

1.00 

.88 

.75 

EST  Total 

35.7 

11.1 

1.00 

.83 

AFQT  Percentile 

49.5 

28.4 

1.00 

785 


Table  6.  Equipercent i le  Calibration  of  EST  to  AFQT  (N  =  869) 


EST  Score 

Cumulative  Z 
Below  EST  Score 

AFQT 

Percentile 

Cumulative  Z 
Below  AFQT  Score 

i-n 

1.4 

4 

0.9 

12 

1.7 

5 

1.5 

13 

2.3 

6 

2.6 

14 

2.8 

6 

2.6 

15 

3.2 

8 

3.3 

16 

3.6 

8 

3.3 

17 

4.6 

10 

4.5 

18 

5.4 

12 

5.5 

19 

6.2 

13 

6.2 

20 

6.9 

14 

7.4 

21 

7.4 

14 

7.4 

22 

8.5 

15 

8.6 

23 

9.8 

15 

8.6 

24 

10.6 

16 

10.9 

25 

11.5 

18 

12.1 

26 

12.7 

19 

12.8 

27 

14.4 

20 

14.8 

28 

16.0 

22 

16.6 

29 

17.5 

23 

17.7 

30 

19.9 

25 

20.6 

31 

22.0 

28 

22.9 

32 

24.2 

30 

25.5 

33 

27.4 

33 

28.8 

34 

30.6 

36 

32.5 

35 

34.3 

40 

36.0 

36 

38.0 

42 

38.4 

37 

41.1 

44 

41.4 

38 

44.9 

48 

45.3 

39 

47,6 

49 

47.8 

40 

50.5 

50 

50.5 

41 

56.5 

54 

56.8 

42 

60.4 

59 

62.6 

43 

64.8 

63 

66.9 

44 

70.0 

68 

72.0 

45 

75.5 

74 

77.6 

46 

82.9 

80 

85.4 

47 

90.0 

87 

92.8 

48 

96.0 

95 

93. 0 

Table  7.  Distribution  of  EST  Scores  by  AFQT  Category 


r 

I 

I 


AFQT  Category 


EST 

Score 

1 

-15 

16-20 

31-49 

50-64 

65 

&  > 

H 

N 

N 

1 

N 

N 

_ 2 

N 

Z 

N 

£ 

1-12 

7 

90.0 

2 

10.0 

_ 

_ 

20 

13-14 

7 

87.5 

- 

- 

- 

1 

12. 

5 

- 

- 

- 

- 

8 

15-16 

6 

50.0 

3 

25.0 

3 

25.0 

- 

- 

- 

- 

- 

- 

12 

17-18 

9 

64  3 

3 

21.4 

2 

21.4 

- 

- 

- 

- 

- 

- 

14 

19-20 

5 

50.0 

4 

40.0 

1 

10.0 

- 

- 

- 

- 

- 

- 

10 

21-22 

5 

23.8 

7 

33.3 

6 

28.6 

2 

9. 

5 

1 

4.8 

- 

- 

21 

23-24 

4 

26.7 

5 

33.3 

3 

20.0 

3 

20. 

0 

- 

- 

- 

- 

15 

24-26 

4 

16.0 

6 

24.0 

5 

20.0 

6 

24. 

0 

3 

12.0 

1 

4.0 

25 

27-28 

3 

11.1 

6 

22.2 

6 

22.2 

9 

33. 

3 

3 

11.0 

- 

- 

27 

29-30 

5 

12.8 

6 

15.4 

12 

30.8 

12 

30. 

8 

2 

5.1 

2 

5.1 

39 

31-32 

2 

4.3 

4 

8.5 

11 

23.4 

22 

46. 

8 

6 

12.8 

2 

4.3 

47 

33-34 

5 

8.3 

4 

6.7 

18 

30.0 

23 

38. 

3 

7 

11.7 

3 

5.0 

60 

35-36 

1 

1.7 

2 

3.4 

8 

13.6 

27 

45. 

8 

18 

30.5 

3 

5.1 

59 

37-38 

1 

1.8 

1 

1.8 

11 

19.3 

22 

38. 

6 

18 

31.6 

4 

7.0 

57 

39-40 

- 

- 

1 

1.3 

5 

6.5 

31 

40. 

3 

23. 

29.9 

17 

22.1 

77 

41-42 

- 

- 

- 

- 

- 

- 

22 

30. 

6 

24 

33.3 

26 

36.1 

72 

43-44 

- 

- 

- 

- 

2 

2.2 

10 

10. 

8 

34 

36.6 

47 

50.5 

93 

45-46 

- 

- 

- 

- 

- 

- 

3 

2. 

4 

21 

16.7 

102 

81.0 

126 

47-48 

- 

- 

- 

- 

- 

- 

- 

- 

6 

6.9 

81 

93.1 

87 

N 

75 

54 

93 

193 

166 

288 
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Abstract 

After  about  20  years,  a  follow-up  was  done  of  the  Officer  Evaluation 
Center  (OEC),  where  90C  first  and  second  lieutenants  underwent  3  days  of 
assessment  of  their  technical,  administrative  and  combat  skills.  From  15 
exercises,  over  2000measures  were  taken  of  each  officer.  These  observa¬ 
tions  were  reduced  through  factor  and  other  analyses  to  341  variables  which 
yielded  C  cross-situational  factors. 

During  the  current  follow-up,  the  ability  of  25  rmaining  summary  var¬ 
iables  to  discriminate  between  the  group  of  officers  who  left  the  Army  after 
their  initial  2-year  commitment  and  the  group  remaining  for  a  full  career  term 
was  tested.  A  second  discriminant  analysis  was  performed  among  those  who  left 
active  duty  after  2  years.  These  were  grouped  ’  tcording  to  rank  (1st  Lieuten¬ 
ant  or  Captain)  at  the  time  of  completing  their  reserve  commitments. 

Both  analyse?  yielded  significant  discriminant  functions.  In  the  former 
(active  vs.  discharged)  analysis  65.38%  of  cases  were  correctly  classified  and 
in  the  latter  67.87%. 
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Background 

In  the  late  1950's  end  early  1960's  research  was  conducted  by  the  U.S. 

Army  Personnel  Research  Office  to  develop  means  of  identifying  officers  with 
the  aptitudes  and  chnrac ter ist ics  to  successfully  meet  the  demands  of  different 
types  of  command  responsibility.  In  essence,  the  research  program  centered 
arcund  the  development  of  the  Differential  Officer  Battery  (DOB).  This 
battery  included  measures  of  Information  ranging  from  military  tactics  to 
the  physical  sciences,  sports  and  the  art:;.  Biographical  reports  and  self- 
descriptive  statcjrunts  ot  interests  and  attitudes  were  also  included.  In  the 
process  of  development  and  refinement,  the  battery  was  administered  to  6500 
active,  duty  officers  in  1958  and  1959  and  about  4000  in  1961  and  1962 
(Helme,  Will^snin  and  Grafton,  1971). 


Suitable  criterion  measures  were  needed  to  validate  this  instrument. 
Ratings  by  peers  and  superiors  were  used  as  part  of  the  validation  effort. 
However,  these  were  not  totally  satisfactory  in  that  the  DOB  had  been  designed 
to  differentially  assess  potential  for  combat,  technical  and  adm’-  nistrat  ive 
•"ssigniiicnts.  An  officer's  job  rating  was  relevant  only  to  his  current  assign¬ 
ment  which  cculJ  be  reps  amative  of  only  one  of  the  three  categories. 
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It  was  decided  that  a  series  of  situational  tests  would  be  administered 
to  serve  as  additional  validation  criteria.  These  would  allow  assessment  of 
each  officer  in  each  of  the  three  areas  and  provide  the  added  advantage  of 
uniformity  of  tasks  and  standardization  of  observations. 

For  the  purpose  of  administration  of  these  situational  tests,  the  Officer 
Evaluation  Center  (OEC)  was  established  at  Fort  McClellan,  Alabama  on  1  March 
1962.  The  first  year  of  the  center's  operation  was  spent  staffing,  training 
assessors  and  finalizing  procedures,  the  first  officers  who  had  taken  the 
DOB  were  not  testsd  until  February  of  1963.  Final  revisions  were  made  based 
on  this  "shakedown"  sample  and  f or-the-record  testing  began  in  June  of  1963. 

In  the  process  of  refinement,  all  OEC  exercises  had  been  worked  into  a 
central  scenario.  This  framework  was  that  of  a  simulated  Military  Assistance 
Advisory  Group  (MAAG)  Headquarters.  New  assessees  were  told  to  assume  that  they 
were  "reporting  for  duty"  at  this  MAAG  Headquarters  located  in  a  friendly  host 
nation.  All  tests  then  became  a  succession  of  assignments  to  be  performed 
while  temporarily  awaiting  reassignment  to  a  field  unit  (Willemin,  1964). 

Exercises  were  selected  to  provide  reliable  although  not  necessarily 
complete  coverage  of  the  technical,  administrative  and  combat  areas.  All 
exercises  had  to  meet  certain  conditions.  They  were  required  to  be  able  to 
be  performed  without  specialized  training  and  experience,  to  be  recognizable 
as  representative  military  requirements,  and  to  have  militarily  meaningful 
outcomes  characteristic  of  good  or  poor  performance. 

Exercises  were  drafted  with  the  assistance  of  subject-matter  experts, 
field  tested  and  then  technically  reviewed  at  the  appropriate  branch  schools. 
They  were  designed  to  include  measures  of  the  following  categories  of  behaviors: 
perceiving  situational  elements,  judging  future  developments,  analyzing  problem 
elements,  planning  future  action,  organizing  resources,  deciding  the  course  of 
immediate  action,  taking  the  initiative  to  act,  communicating  orders  and  infor¬ 
mation,  training  and  directing  subordinates,  and  persisting  under  stress 
(Willcmin,  1964), 

Each  exercise  was  to  be  primarily  representative  of  one  of  the  three  areas 
of  interest.  There  were  five  exexcises  developed  in  each  of  the  three  areas. 

A  summary  of  these  is  given  as  follows: 

Combat  Exercises: 


1.  March  Order.  Examinee  plans  a  tactical  road  march  and  reacts  to 
interruptions  by  senior  and  subordinate  personnel. 

2.  Observation  Post.  Examinee  directs  fire  onto  visible  targets.  He 
must  perceive  terrain,  enemy  activity  and  targets;  ’stimate  range  and  com¬ 
municate  this  information. 

3.  Security  Mission,  Examinee  must  anticipate  enemy  actions,  quickly  plan 
offensive  and  defensive  actions  and  direct  subordinates  through  face-to-face 
con f nr  r  , 
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4.  Roadblock.  Examinee  must  apply  basic  tactical  principles  and 
communicate  important  information  to  others. 

5.  Route  Reconnaissance  Patrol.  Examinee  must  cope  with  persistent 
obstructions  to  mission  progress,  respond  to  critical  situational  factors 
and  withstand  psychological  stress  under  simulated  prisoner-of-war 
conditions. 

Technical  Exercises: 

1.  Communications  Exhibit.  Examinee  trouble  shoots  technical  equipment 
and  must  use  subordinates  as  effectively  as  possible. 

2.  Automotive  Inspection.  Examinee  detects  equipment  deficiencies  and 
recommends  and/or  performs  corrective  actions. 

3.  Road  Damage  and  Radiation  Survey.  Examinee  must  organize  teams, 
train  subordinates,  collect  and  communicate  information  and  make  plans  under 
conditions  of  time  pressure,  obstacles,  harassment  and  fatigue. 

4.  Airfield  Layout.  Examinee  must  use  technical  information  to  select 
an  airfield  site  and  compute  the  necessary  length  of  a  runway. 

5.  Weapons  Assessment.  Examinee  reports  on  the  characteristics  of  an 
enemy  weapon  from  a  technical  intelligence  point  of  view. 

Administrative  Exercises; 

1.  Improper  Supply  Records.  Examinee  analyzes  supply  records,  writes  a 
summary  maaorandum  and  (tactfully)  communicates  discrepancies  . 

2.  Office  Managaaent.  Examinee  must  organize  administrative  tasks  and 
correct  improper  office  procedures. 

3.  Production  Analysis.  Examinee  analyzes  production  data,  organizes 
unit  for  efficient  operation,  and  consnunicates  plans. 

4.  Site  Selection.  Examinee  must  use  logistical  judgment  to  interpret 
information  and  consider  factors  in  site  selection. 

5.  Highway  Traffic  Plan.  Examinee  must  plan  logistical  support  for  a 
large  scale  tactical  operation  and  respond  to  rapid  political  and  military 
changes. 

Each  officer  went  through  the  exercises  as  an  individual.  The  entire 
set  required  3  days  to  administer.  The  combat  setting  was  made  as  realistic 
as  possible  with  17  officers  and  41  enlisted  personnel  playing  the  roles  of 
United  States,  allied  and  enlisted  personnel.  The  first  day's  exercises 
were  carried  out  under  time  pressure  but  "peacetime"  conditions.  On  the 
second  day  the  examinee  was  awakened  at  0230  after  about  four  hours  sleep 


and  told  that  the  host  nation  was  at  war.  The  remainder  of  the  exercises  were 
carried  out  under  "emergency"  conditions  and  increasing  fatigue  on  the  part  of 
the  examinee  (Ilelme,  Willemin  and  Grafton,  1971). 

Method 


Sample 

The  original  sample  of  OEC  participants  was  drawn  from  the  pool  of  4000 
lieutenants  who  took  the  DOB  between  1961  and  1964.  Of  these,  about  900  at¬ 
tended  the  OEC  after  one  or  two  years  of  active  duty.  Both  first  and  second 
lieutenants  were  included  as  were  graduates  of  the  U.S.  Military  Academy  and 
both  Reserve  and  Regular  Army  graduates  of  Reserve  Officer  Training  Corps 
(R0TC) .  The  lieutenants  represented  10  different  combat  arms,  combat  support 
and  combat  service  support  branches.  Only  about  737  of  the  original  900  par¬ 
ticipants  are  included  in  the  data  base.  The  remaining  officers  were  members 
of  the  first  thirty-odd  groups  used  as  a  "shakedown  sample"  to  refine  measures 
and  exercises  (Helme,  Willemin  and  Grafton,  1971). 

The  first  step  of  the  current  research  was  to  determine  where  these  737 
men  were  in  relation  to  their  military  careers  and  what  dataware  available  to 
indicate  whether  their  performances  at  the  OEC  bore  any  relationship  to  their 
later  degrees  of  military  success. 

Through  the  Army's  locator  service,  we  were  able  to  find  the  names  of 
101  OEC  participant  officers  still  on  active  duty.  Ac  the  time  of  follow-up 
sampling  (1980)  these  included  1  Colonel,  86  Lieutenant  Colonels,  11  Majors 
and  3  whose  current  ranks  were  indeterminate  from  information  provided.  The 
names  of  412  additional  OEC  participants  were  found  through  computer  search 
at  the  National  Personnel  Records  Center  (NPRC)  in  St.  Louis,  Missouri.  The 
location  of  their  records  at  NPRC  indicated  that  these  men  had  been  discharged 
from  all  active  and/or  reserve  military  «.  mmitments. 

The  military  history  of  the  remaining  224  participants  may  be  considered 
unknown.  However,  there  is  a  third  major  repository  of  military  records 
which  is  the  Reserve  Component  Personnel  and  Administrative  Center  (RCPAC) 
in  St.  Louis,  Missouri,  This  center  houses  records  of  individuals  involved 
with  Reserve  Component  (National  Guard,  etc.)  units.  It  is  not  unlikely  that 
many  of  the  remaining  OEC  records  could  be  found  there,  but  we  have  as  yet 
not  been  able  to  make  suitable  arrangement  Lo  obtain  information  from  this 
center . 

Information  Gathered 


It  was  quickly  determined  that  a  limit ed  amount  of  information  would  be 
available  for  the  "discharged"  subsample  located  at  NPRC.  A  much  greater 
variety  of  information  is  available  for  the  subsample  of  officers  still  on 
active  duty  which  would  necessitate  a  much  more  thorough  process  of  devel¬ 
opment  for  a  "cr iter ion-of -success"  score.  Therefore,  it  was  decided  to 
obtain  available  information  on  the  "discharged"  group  as  a  first  step. 
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We  found  only  certain  useful  forms  t.o  be  contained  as  a  rule  in  the 
majority  of  NPRC  folders.  These  were: 

Form  DD  214 — Report  of  Transfer  or  Discharge 

Form  USAAC  872 — Discharge 

Form  67-5,  67-6 — US  Army  Officer  Evaluation  Report 

Only  those  itens  were  taken  from  these  forms  which  might  reasonably  be 
considered  indicative  of  military  success.  These  were:  1)  Number  of  years 
of  active  military  service;  2)  rank  at  the  time  of  discharge  from  active 
duty;  3)  rank  at  the  time  of  discharge  from  the  reserve  component;  4)  reason 
for  discharge  from  the  reserve  component;  5)  Officer  Evaluation  Report  totals. 

OEC  Summary  Variables 

During  the  conduct  of  the  assessment  center  more  than  2000  observations 
and  judgments  were  recorded  on  each  assessee.  These  consisted  of  checklists 
of  specific  behaviors,  scale  ratings  and  quantitative  summations  of  written 
products.  Initially  these  items  were  analyzed  by  factor  analyses  conducted 
separately  for  each  exercise.  In tercorr elation  and  factor  analysis  of  these 
scores  yielded  342  scales  or  variables. 

The  number  of  variables  was  then  reduced  to  256  by  elimination  of  those 
which  were  linear  combinations  of  less  complex  ones  and  those  on  which  90Z 
or  more  of  the  participants  scored  alike.  Further  factor  analysis  resulted 
in  the  identification  of  a  set  of  30  factors,  all  but  two  of  which  were 
specific  to  a  single  task. 

To  find  cross  task  factors,  "marker"  variables  were  chosen  for  each 
factor.  These  were  then  combined  with  additional  independent  scales,  refac¬ 
tored  and  rotated.  A  set  of  eight  factors  was  identified  and  analysis  using 
these  8  factors  was  then  extended  to  the  remaining  variables  (Helme,  Will  tan  in 
and  Grafton,  1971). 

Information  remaining  from  the  original  set  of  OEC  data  consists  of  25 
summary  variables.  These  scores  represent  7  of  the  original  35  exercises 
(3  from  administrative  exercises  and  2  each  from  the  combat  and  technical 
areas).  These  summary  scales  were  part  of  the  342  variables  derived  in  the 
initial  set  of  analyses.  About  half  are  shown  as  loading  on  the  final  8 
cross-situational  factors  derived  from  analyses.  Few  (about  5)  are  seen 
as  markers  or  variables  loading  on  the  intermediate  set  of  30  variables. 

It  is  likely  that  many  of  them  were  omitted  from  this  stage  of  analysis 
because  they  were  linear  combinations  of  simpler  variables.  A  summary 
description  of  the  variables  is  provided  at  Table  1. 

Measures  of  leader  characteristics  resulting  from  Differential  Officer 
Barrery  (DOB)  development  were  correlated  with  OEC  variables  and  factor 
scores.  A  number  of  significant  correlations  were  found  and  differential 
prediction  of  the  combat  and  technical-managerial  leadership  domains  was 
shown  (Helme,  Wilianin  and  Grafton,  1974). 
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TABLE  I 


SUMMARY  DESCRIPTION  OF  REMAINING  OEC  VARIABLES 


Area 


Exercise 


Variable 


Loads  on  Cross- 
Task  Factor 


Administrative  Highway  Traffic  Plan  Factor  Total 

Attention  to  Data 
Kequir  audits 

Office  Managaaent  Sequencing  of  Operations 

Retained  Procedures 
Site  Selection  Factor  Total 


8-Technical  Skills 
1-Technical  Managerial 
Leadership 


Technical  Automotive  Inspection  Factor  Total 

Identifying  Information 
Airfield  Layout  Sites  Weighted  Scale 

Basic  Geographical 
Considerat ions 
Operational  Hazards 
Engineering  Considerations 
Computational  Accuracy  „ 

Utilization  of  Terrain 
Features 

Number  of  Sites  7-Tactical  Skills 

Evaluated 

Thoroughness  of  Runway 
Report 

Total  Score  7-Tactical  Skills 


8-Technical  Skills 
5-Mission  Persistence 


Combat 


Security  Mission 

Firm  Handling  of 

Personnel 

Effectiveness  of  Defense 
Plan 

Total  Score 

2-Combat  Leadership 

Roadblock 

Attitude  &  Moti/ation 
Tactical  Control 

Instruction  of  Men 

Handling  of  Sniper 

3 -Team  Leadership 

Confidence  & 

Forcefulness 

2-Combat  Leadership 

Ef i ect iveness  in 

Establishing  Abatis 

3 -Team  Leadership 

Marker  for  intermediate  factor  30-Comco  &  Staff 

g 

Independent  variable 

Q 

Marker  for  intermediate  factor  23-Mission  Accomplis'naent 


Results  and  Discussion 


It  was  determined  that  the  best  use  of  the  existing  data  would  be  to 
termine  how  effectively  the  OEC  variables  could  discriminate  between  the  _'i’p 

of  participants  who  chose  to  get  out  of  the  Army  after  their  initial  obligation 
and  the  group  who  decided  to  remain  for  a  full  career  term.  The  decision(s)  to 
remain  in  the  Army  is  the  fundamental  criterion  of  a  successful  military  career. 

It  is  the  summary  outcome  of  all  the  skills,  motivations,  experienced  successes, 
etc.  which  allow  one  to  choose  and  successfully  complete  a  given  life's  work. 

Any  set  of  variables  potentially  able  to  detect  fine  differences  in  level  of 
success  such  as  one-time  ratings  or  awards  should  also  be  able  to  detect 
differences  in  this  basic  yet  overriding  criterion. 

The  group  of  101  career  officers  for  the  analysis  was  self-defined. 

However,  the  discharged  group  required  some  further  definition.  Of  412 
cases  available,  we  were  actually  able  to  get  data  on  352.  Of  these  by  far 
the  majority  (237)  fit  the  pattern  of  minimal  2  year  active  duty  commitment 
and  completion  of  the  remainder  of  their  obligation  in  setae  type  of  a  reserve 
unit.  (As  previously  mentioned,  we  have  not  been  able  to  obtain  data  on  any¬ 
one  who  may  still  be  maintaining  his  reserve  status.) 

It  was  decided  to  use  the  homogeneous  sample  of  237  for  the  second  group. 

An  informal  perusal  of  the  records  indicated  that  those  having  more  or  less  than 
two  years  of  active  duty  represented  a  much  more  ill-defined  group.  These  in¬ 
cluded:  officers  killed  in  Vietnam,  West  Point  Graduates  leaving  after  their 
minimal  5-year  commitment,  medical  discharges  and  a  variety  of  unique  cases. 

A  stepwise  discriminant  analysis  was  performed  using  the  "2-year"  and 
"20-year"  career  groups  described  and  a  significant  discriminant  function  was 
found.  The  value  of  Wilks'  lambda  was  .89  with  a  corresponding  chi-square  of 
35.54  (d.f.  =  7;  p  <  .001).  The  canonical  correlation  was  .318.  However, 
neither  of  these  statistics  indicates  a  very  high  degree  of  separation 
between  the  groups. 

Standardized  function  coefficients  are  shown  at  Table  2  for  the  7  variables 
of  the  total  25  included  in  the  function.  These  show  the  relative  contributions 
of  each  variable  to  the  function.  By  looking  back  to  Table  1,  one  can  determine 
the  factors  from  the  original  analysis  on  which  these  variables  loaded.  It  is 
interesting  to  note  that  while  only  10  of  the  25  Summary  variables  were  reported 
as  loading  on  the  final  factors  of  the  original  analysis,  5  of  the  7  appearing  in 
the  current  analysis  came  from  these  10. 

The  cross-comparison  also  helps  to  lend  interpretation  to  the  function.  To 
the  extent  that  these  variables  are  indicative  of  the  original  factors  of  combat 
and  team  leadership,  tactical  skills  and  mission  persistence,  the  military  career¬ 
ists  appear  to  be.  distinguished  from  the  other  group  along  a  general  "military 
leadership"  dimension. 
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TABLE  2 


Standardized  Discriminant  Function  Coefficients  and  Group  Means 


First  Discriminant  Analysis  (2  vs.  20  years) 


Variable 

Standardized 
Coef f icient 

Group 
20-yr. Active 

Means 

2-yr .Active 

la 

Airfield  Layout '-No.  of  sites  evaluated 

~  2  -.50 

5.06 

5.24 

lb  —  2 

Automotive  Inspect-Factor  Total 

• 

ls> 

9.34 

8.16 

lc 

Roadblock-Confidence  &  Forcefulness 

.80 

28.61 

25.03 

Security  Mission-Total  Score 

.44 

297.71 

254.11 

Airfield  Layout-Comput .  Accuracy 

-.30 

.55 

.64 

Roadblock-Instruction  of  Men  ^ 

-.34 

10.71 

9.89 

Site  Selection-Factor  Total 

.23 

10.12 

9.35 

Second  Discriminant  Analysis  (Discharged 

Variable 

as  1LT  vs.  CPT) 

Standardised 

Coefficient 

Group  Means 

Captain.  1st  Lieutenant 

Airfield  Layout-No.  of  sites  evaluated 

-  2 

-.38 

5.18 

5.33 

lb  2 

Automotive  Inspection-Factor  Total 

.53 

9.25 

7.53 

Roadblock-Attitude  &  Motivation 

.57 

29.48 

27.20 

Roadblock-Handling  of  Sniper 

-.32 

4.82 

5.26 

Roadblock-Tactical  Control 

-.38 

3.12 

3.34 

le 

Highway  Traffic  Plan-Attn.  to  Data 

-.29 

3.88 

4.23 

Airfield  Layout-Util.  Requirements  of 
Terrain  Features 

.i0 

1.67 

1.45 

These  variables  loaded  cn  factors  in  the  original  analyses:  a)  Tactical 
S ’&ill&;  b)  Mission  Persistence;  c)  Combat  Leadership;  d)  Team  leadership; 
e)  Ttehnicel-Managerial  Leadership. 

These  variables  included  in  both  discriminant  functions. 
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Following  determination  of  the  discriminant  function,  its  ability  to 
correctly  classify  cases  was  examined.  Assuming  the  prior  probability  of  group 
membership  to  be  at  the  chance  (502-502)  level,  65.382  of  cases  were  correctly 
classified  using  the  discriminant  function.  This  represents  a  30.762  Improve¬ 
ment  over  chance.  Classification  results  are  shown  below: 

Predicted  Career  Group 

Actual  Career  Group  20-year  2 -year 

20-year  68  (67.32)  33  (32.72) 

2-year  84  (35.42)  153  (64.62) 

Lack,  of  Information  and  the  considerable  time  span  Involved  make  it  diffi¬ 
cult  to  discuss  those  considerations  that  normally  go  with  classification. 

For  example,  the  large  difference  in  group  sizes  would  suggest  Improvement  in 
overall  classification  through  the  use  of  prior  probabilities  of  group  membership 
other  than  chance.  However,  the  most  appropriate  percentages  to  use  were  not 
readily  available  to  us.  They  would  be  the  statistical  projections  of  officer 
retention  of  twenty  years  ago. 

Those  factors  affecting  tolerance  for  misclassif ication  have  also  changed. 

The  Selective  Service  System  was  still  in  effect  in  the  early  1960's.  Under 
that  system  the  loss  of  a  potentially  successful  officer  through  misclassif ica¬ 
tion  might  have  been  mucn  less  costly  than  it  is  today. 

Following  the  initial  analysis,  the  2-year  sample  of  officers  deciding  not 
to  remain  in  the  Army  was  examined.  This  sample  revealed  a  blmodal  distribution 
along  the  dimension  of  rank  at  the  time  of  discharge  from  the  reserves.  Of  the 
222  officers  for  whom  records  were  available,  84  were  discharged  as  Captains 
and  137  as  First  Lieutenants. 

Assuming  this  to  be  an  indicator  of  military  success,  a  second  stepwise 
disci Iminant  analyses  was  performed  using  groups  formed  on  the  basis  of  rank  at 
the  time  of  reserve  discharge.  A  significant  discriminant  function  was  found 
with  Wilks'  lambda  =  .90,  chi-square  *  21.58,  d.f.  “  7,  p  =  .003  (canonical 
corrslation  =  .308). 

The  accuracy  of  classification  was  checked  and  67.872  of  cases  were  correctly 
classified  for  an  improvement  of  35.74  2  over  chance.  Classification  results  are 
shown  below. 

Predicted  Rank  at  Lischarge 

Actual  Rank  at  Discharge  1LT  CPT 

1LT  93  (67.92)  44  (32.12) 

CPT  27  (32.12)  57  (67.92) 
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Standardized  discriminant  function  coefficients  shown  at  Table  2  indicate 
the  relative  contributions  of  variables  to  the  function.  As  with  the  first  an¬ 
alysis,  exactly  7  of  the  25  variables  are  included  in  the  function.  However, 
except  for  the  two  variables  marked,  sets  of  variables  belonging  to  the  two 
separate  analyses  do  not  overlap.  Also  the  variables  in  the  latter  analysis 
tend  not  to  be  the  ones  which  loaded  on  factors  in  the  original  analysis. 

(The  exceptions  are  the  two  overlapping  variables  mentioned  and  the  "highway" 
variable. ) 

This  would  suggest  that  the  dimens ion (s)  separating  the  career-bound  young 
officer  from  the  one  who  will  leave  for  civilian  life  may  not  be  entirely  the 
same  as  those  determining  success  as  a  young  officer.  One  obvious  difference 
might  be  the  factor  of  motivation.  Many  of  these  officers  may  have  been  bright 
and  capable,  yet  only  interested  in  fulfilling  their  minimal  military  obligation. 
However,  discussion  at  this  point  would  be  speculative  rather  than  truly  data- 
based . 

In  conclusion,  it  appears  somewhat  remarkable  that  OEC  measures  given  so 
early  after  entry  into  the  Army  were  able  to  measure  something  of  what  distin¬ 
guishes  a  future  career  officer  from  a  non-careerist.  Given  a  few  more  years 
we  will  be  able  to  determine  how  well  these  variables  can  discriminate  among 
the  successful  and  the  "super-successful,"  i.e.,  those  officers  who  become 
colonels  and  generals  rather  than  retiring  as  lieutenant  colonels.  Perhaps 
the  best  is  yet  to  come. 
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The  Problea  of  Range  Restriction  in  Test  Validation 


This  pa: er  examines  some  of  the  legal  arguments  advanced  over 
the  years  with  respect  to  the  interpretation  of  the  validity  coef¬ 
ficient.  The  issue  as  to  whether  or  not  the  validity  coefficient  i3  of 
a  magnitude  Large  enough  to  have  "practical  significance"  is  given 
considerable  attention.  The  Equal  Employment  Opportunity  Commission 
has  argued  that  the  validity  coefficient  is  usually  too  low  to  have 
practical  significance.  Educational  testing  Service  (ETS),  however, 
has  contended  that  the  validity  coefficient  does  have  practical  signif¬ 
icance  but  it  should  be  corrected  for  restriction  in  range.  The 
Department  of  Justice  claimed  that  range  restriction  correction  form¬ 
ulas  cannot  be  used  because  the  assumptions  underlying  them  cannot  be 
met.  The  Department  of  Justice  in  its  defense  cited  the  Division  14 
Principles  for  Validation  and  Use  of  Personnel  Selection  Procedures, 
which  asserts  the  desirability  of  having  validation  samples  be  as 
similar  as  possible  to  the  applicant  pool.  Thorndike’s  range  restric¬ 
tion  correction  formulas  and  their  underlying  assumptions  are  carefully 
reviewed  in  this  paper.  These  formulas  are  applied  to  data  on  actual 
selection  instruments  to  obtain  the  estimated  true  validity  coefficient 
that  would  be  obtained  if  the  valdity  coefficient  was  based  on  the 
total  applicant  population.  Also,  criterion  referenced  testa  are 
discussed  and  suggested  as  viable  alternatives  to  norm  referenced 
tests,  along  with  factors  contributing  to  criterion  biases- 
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Finding  ways  to  improve  selection  systems  have  been  at  the  forefront 
of  personnel  psychology  for  many  years.  Very  sophisticated  selection  systems 
have  been  developed  «er  the  years  but  most  have  lacked  good  validity.  The  most 
recent  descision  of  the  Justice  Department  with  respect  to  the  Professional 
Administrative  Career  Exam  (P.A.C.E.)  has  called  for  more  tests  to  be  developed 
which  have  higher  validity  and  less  adverse  impact  on  minority  group  members. 

Boldt  (1977)  has  pointed  out  the  practical  significance  that  the  validity 
coefficient  plays  in  the  legal  process  through  the  EEOC  guidelines.  The  EEOC 
guidelines  asserted  that  test  validity  must  not  only  be  statistically 
significant  but  of  a  magnitude  of  which  to  suggest  that  the  benefits  obtained 
from  using  a  particular  selection  device  is  worth  the  trouble  of  using  it. 

Boldt  has  also  pointed  out  that  the  educational  testing  service  ha6  asserted 
that  the  correlation  coefficient  was  the  appropiate  statistics  but  should  be 
corrected  for  restriction  in  range. 

The  Department  of  Justice  claimed  that  correlations  between  selection 
devices  and  criteria  should  not  be  corrected  for  range  restriction  because 
the  assumption  of  homoscedasticity ,  linearity,  and  normality  cannot  be  met. 

This  question,  however,  raises  an  important  question  as  to  which  validity 
coefficient  is  the  best,  the  uncorrected  or  corrected.  The  purpose  of  this 
paper  is  to  shed  some  light  on  some  of  the  problems  associated  with  validity 
coefficients  and  provide  an  cwerview  of  three  main  cases  of  correcting  for 


range  restriction  in  testing. 


Generally  when  a  test  Is  administered,  it  is  administered  to  applicants 
who  walk  into  a  testing  center  and  take  a  test.  However,  in  some  situations, 
the  tester  may  have  control  over  the  applicant  pool,  but  thi6  situation  very 
rarely  occurs.  Since  the  total  number  of  people  who  take  the  test  makeup 
the  applicant  pool,  the  normative  information  which  is  obtained  from  the 
sample  should  be  based  on  the  total  applicant  pool.  Very  seldom,  however, 
are  validity  coefficient  provided  for  the  entire  applicant  group  to  whom 
the  test  is  administered.  Consequently,  validity  coefficients  can  only 
be  obtained  for  those  people  who  are  selected  on  the  job.  When  this  occurs, 
the  range  of  test  scores  becomes  restricted,  and  the  sample  ceases  to  become 
a  representative  sample  of  the  general  population  of  applicants  and  thus 
cannot  be  generalized  to  the  total  applicant  population. 

Range  restriction,  as  applied  to  test  scores,  is  a  general  term  which 
means  that  the  test  scores  for  a  particular  group  are  concentrated  in  only 
a  portion  of  the  possible  range  of  scores  (Kaufman,  1972).  Groups  that 
are  restricted  in  range  have  smaller  standard  deviations  than  groups  that 
are  not  restricted.  Also,  when  test  scores  are  restricted  in  range, 
correlation  between  a  test  and  a  criterion  will  be  lower  than  the  scores 
for  the  unrestricted  groups. 

When  a  high  standard  of  selectivity  is  employed,  the  effect  of  range 
restriction  on  the  resulting  validity  coefficients  becomes  even  more 
profound.  Thorndike's  (1949)  frequently  cited  wartime  study  is  a  good 
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illustration  of  this  relationship.  In  this  study,  a  battery  of  tests  to 
predict  success  in  pilot  training  was  give  :o  a  large  group  of  men  as  a 
part  of  an  Army  Air  Force  Aviation  Psychology  Program.  In  using  the  strict 
selection  standards  that  were  in  effect  toward  the  end  of  World  War  II, 
only  13Z  of  these  men  would  have  qualified  to  enter  pilot  training  on  the 
basis  of  test  scores.  However,  all  of  the  men,  regardless  of  their  test 
scores,  were  allowed  to  enter  training  for  experimental  purposes.  Table  1 
below  illustrates  the  correlation  coefficients  between  test  scores  and  the 
criterion  for  the  total  and  qualified  groups  entering  training. 


TABLE  I 


Correlation 

with  Criterion 

Total  Croup 

Qualified  Croup 

Predictor 

<N  -  1036) 

(N  -  136) 

Pilot  Stanlne  (Composite  Score) 

.64 

.18 

Hechanlcal  Principles  Test 

.44 

Complex  Coordination 

.40 

Instrumental  Comprehension  Test 

.45 

.27 

Finger  Dextrity  Test 

.IB 

.00 

Ceneral  Information  Test 

.46 

.20 

Arithmetic  Reasoning  Test 

.27 

.18 

From  Thorndike,  1949,  page  171 


These  results  show  the  effect  that  can  occur  on  validity  coefficients 
when  restricted  and  unrestricted  groups  are  compared.  According  to 
table  1,  the  composite  aptitude  score  has  the  highest  correlation  (.64) 
when  the  total  group  is  taken  into  consideration,  but  a  substantially 
lower  correlation  (.18)  when  the  group  is  restricted.  Judging  by  the 


805 


qualified  group  alone,  the  Complex  Coordination  Test  and  the  Mechanical  Principles 
Test  were  anong  the  worst  predictors.  However,  for  the  total  group,  both  the 
Mechanical  Principles  and  Complex  Coordination  Tests  were  among  the  best  predictors. 

In  order  to  make  practical  use  of  validity  statistics  for  a  restrict¬ 
ed  group,  it  is  necessary  to  have  statistical  correction  procedures  to 
estimate  what  validity  coefficients  would  have  been  obtained  if  it  had 
been  possible  to  obtain  test  and  criterion  data  from  a  representative 
sanple  of  those  to  when  the  selection  device  was  applied.  Thorndike  (1949) 
discussed  three  types  of  correction  procedures  for  range  restriction.  Case  I, 

Case  II,  and  Case  III. 

Case  I  occurs  when  there  is  seme  degree  of  truncation  on  the  criterion 
(Schmidt  and  Hunter,  1977;  Gullisksen,  1950).  In  most  practical  situations, 
a  test  or  multivariables  such  as  exceptionally  good  reocurmc  ndat ions  and  ex¬ 
ceptionally  good  academic  reoords  are  used  for  selection.  However,  there 
may  also  be  situations  wherein  the  criterion  itself  is  used  to  select  employees. 

For  exanple,  a  manufacturing  company  may  wish  to  develop  a  test  to  predict 
job  performance  and  may  accept  all  applicants  regardless  of  their  test  scores, 
but  weed  out  the  bottan  50%  whose  performance  falls  below  a  given  minimum 
standard.  The  Case  I  model  is  rarely  used  in  validity  studies  because  very 
seldom  is  selection  made  on  the  basis  of  the  criterion.  Selection  is  almot 
always  made  on  some  type  of  test.  Nevertheless,  this  case  can  be  put  into 
practice  by  selecting  on  the  criterion  at  a  point  which  corresponds  to  the 
cutting  score  where  selectees  have  been  proven  to  be  successful.  Figure  1  is 
an  illustration  of  selection  occurring  on  the  criterion. 
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Fip  nl.  a  an  Mannon  of  .Mellon  oocuntog  o*  fe,  critorton  {YV 

1  *  » 

Let  Y  represent  the  criterion,  and  X  the  predictor.  Suppose  that  the 
selection  ratio  is  set  at  .05,  then  if  the  assumptions  of  linearity  and  homo- 
escedacity  hold,  setting  a  cutting  score  at  the  top  5Z  on  the  criterion  will 
correspond  to  the  top  5Z  if  the  test  were  used  to  select  employees.  The 
dotted  and  solid  line  in  figure  1  shows  the  ellipse  for  the  total  unrestrict¬ 
ed  group,  and  the  solid  line  represents  the  ellipse  for  the  restricted  group. 
The  higher  the  selection  ratio  the  more  severe  the  restriction  in  range. 
Essentially,  This  is  a  case  of  regression  of  X  on  Y,  instead  of  the  more 
typical  case  of  regression  of  Y  on  X.  While  this  approach  tends  to  be  some¬ 
what  costly,  it  does  offer  an  alternative  to  written  tests. 
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Case  II 


Case  II  situation  occurs  when  there  Is  direct  truncation  on  the  predictor 
variable?  This  model  leads  to  underestimates  of  the  effect  of  range  restriction, 
If  selection  Is  partly  on  the  basis  of  the  criterion  as  well  as  the  predictor. 
However,  the  Case  II  formula  provides  slight  overestimates  of  the  corrected 
coefficients  when  truncation  on  the  test  is  not  perfect.  For  example,  when  some 
applicants  below  the  cutoff  score  are  selected,  or  selection  Is  made  on  the  basis 
of  the  sums  of  scores  on  two  or  more  tests  rather  than  a  single  test  (Schmidt 
and  Hunter,  1977). 

Table  2  was  developed  by  using  the  Case  II  formula  (appendix  A),  and 

facilitates  the  determination  of  R  ,  the  estimated  correlation  between 

12 

predictor  and  criterion  in  an  unrestricted  sample. 

TABLE  2 

Validity  Coefficients  for  Unrestricted  Group  (R  )  Estimated  From 
Values  for  Restricted  Group  (r,,) 


8 

r  -* 

12 

.10 

.15 

.20 

.25 

.30 

i 

1.25 

.12 

.19 

.25 

.31 

.37 

1.50 

.15 

.22 

.29 

.36 

.43 

1.75 

.17 

.26 

.34 

.41 

.48 

2.00 

.20 

.29 

.38 

.46 

.53 

2.50 

.24 

.35 

.45 

.54 

.62 

3.00 

.29 

.41 

.52 

.61 

.69 

4.00 

.37 

.52 

.63 

.72 

.78 

5.00 

.45 

.60 

.71 

.79 

.84 

10.00 

.71 

.83 

90 

.93 

.95 

From  Kaufman,  1972,  page  6. 


.35  .40  .45 


.42 

.48 

.53 

.49 

.55 

.60 

.55 

.61 

.66 

.60 

.66 

.71 

.68 

.74 

.78 

.75 

.79 

.83 

.83 

.87 

.90 

.88 

.91 

.93 

.97 

.97 

.98 
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s 

1  is  Che  ratio  of  the  standard  deviation  of  the  unrestricted  group 
s 

1 

to  the  standard  deviation  of  the  restricted  group  on  the  predictor 

test;  r  is  the  actual  obtained  validity  coefficient  for  the  un- 
12 

restricted  group.  The  unrestricted  coefficient  is  found  by  looking 
up  the  ratio  of  the  unrestricted  standard  deviation  to  the  restricted 
coefficient  at  the  top  of  the  table.  For  example,  if  the  ratio  of  the 
standard  deviation  was  found  to  be  1.25,  and  the  restricted  validity 
coeficlent  was  .25,  then  table  2  would  be  entered  as  follows: 

S 

1 

_  «  1.25,  r  >.25,  which  results  in  an  unrestricted 

s  12 

1 

validity  coefficient  of  (R  )  .31. 

12 

The  magnitude  of  a  pearson  product  moment  correlation  coefficient 
is  directly  related  to  the  standard  deviation  of  the  two  variables 
being  correlated.  A  reduction  in  either  or  both  of  the  standard 
deviations  will  lower  che  correlation  coefficient  between  the  two 
variables. 

Case  III 

The  third  case  of  range  restriction  occurs  when  there  is  truncation 
on  a  third  variable.  CulliBksen  (1950)  dlstinquishes  between  explicit 
selection  and  incidental  selection.  Explicit  selection  is  defined  as 
direct  selection  occurring  on  the  basis  of  a  given  variable  (test),  and 
incidental  selection  occurs  when  there  is  indirect  selection  occurring 
on  the  basis  of  a  given  variable  (criterion)  or  another  test  which  la  highly 
correlated  with  the  explicit  variable.  For  instance,  suppose  a  researcher  ia 
interested  in  trying  out  a  new  test  (Y)  to  see  how  well  it  predicts  job 
performance,  and  the  scores  are  available  for  the  first  test  (X)  which  was 
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used  for  selection.  Selection  is  then  incidental  with  respect  to  test  Y 
(new  test)  because  of  the  high  correlation  between  the  new  test  and  the 
initial  test,  thus  selecting  on  the  first  test  is  basically  the  same  as 
selecting  on  the  second  test.  Restriction  in  range  occurs  on  the  new  research 
test  because  there  will  be  seme  applicants  who  passed  the  first  test  who  will 
not  pass  the  second  test.  She  Case  III  situation  is  more  practical  than  the 
Case  I  situation  and  is  very  frequent  in  concurrent  validity  studies. 

Thorndike  (1949)  discussed  another  Case  III  in  which  one  of  the 
correlations  used  in  the  above  formula  maybe  available  for  the  total  group 
rather  than  the  restricted  group.  For  example,  a  research  test  may  have 
been  given  to  a  general  unselected  group  and  its  correlation  with  the  score 
on  the  selection  test  is  based  on  this  group  (see  appendix  A  for  Case  III 
formula). 

These  formulas  are  basically  usee  for  pearson-product  moment  correlations 
based  on  continuous  variables;  however,  if  the  formula  for  biserial  correlation 
is  to  be  used  in  restricted  groups,  the  distribution  of  traits  underlying  that 
dichotomy  nust  be  normally  distributed  in  the  restricted  groups  (Thorndike,  1949). 

Campbell  (1976)  and  Linn  (1968)  have  pointed  out  that  atteiipting  to  estimate 
reliabilities  and  validities  in  the  appropriate  population  can  pose  a  serious 
problem.  For  instance,  they  argue  that  the  assumptions  of  linearity  and 
homoscedasticity  (equality  of  conditional  variances)  may  easily  be  violated. 

For  exanple,  in  a  bivariate  distribution,  much  less  variability  tends  to  be 
exhibited  at  the  extremes  as  opposed  to  the  middle,  and  so  rather  than  being 
linear,  the  standard  score  of  the  regression  line  tends  to  be  steeper  at  the 
ends  than  in  the  middle.  The  violation  of  the  homoscedasticity  assumption  tends 
to  inflate  the  corrections,  vhile  the  departure  from  linearity  tends  to  deflate  it. 
Linn  (1968)  has  shown  that  by  correcting  correlation  coefficients  as  if  the  test  (x) 
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is  the  explicit  selection  variable,  results  into  the  formula  overcorrecting 
when  the  correlation  between  x  (test  1)  and  z  (criterion)  is  low  and  under- 
correcting  when  the  correlation  is  in  the  middle  range.  The  undercorrection 
and  overcorrection  phenomena  becomes  more  profound  as  the  degree  of  actual 
selection  on  z  (unavailable  predictor  )  becomes  extreme.  Thi6  relationship 
is  shown  in  figute  2  below  which  was  abstracted  from  the  linn  (1968)  report. 


M 


Rxy'  Corftotton  x  and  V 


Figure  2.  An  illustration  ot  the  effects  Of  correction  of  range  as  a 
function  of  the  correlation  between  the  explicit  and  implicit 
predictors. 


let  x  be  the  explicit  selection  variable  (test  used  for  selection),  y  the 

implicit  selection  variable  (a  second  test),  and  z  the  criterion  but  also 

subject  to  implicit  selection.  Then  the  correlation  between  y  and  z  in 

the  selected  group  (r  )  is  a  function  of  the  correlation  between  x  and 

yz 

y  in  the  unrestricted  group  (R  ).  The  broken  line  in  the  figure  2  shows 

xy 

the  value  of  r  in  the  restricted  population  as  a  function  of  R  for 
yz  xy 

values  of  R  .  If  y  is  treated  a6  an  explicit  selection  variable  and 

r  is  corrected  for  homcgeniety  of  variances  of  y  in  the  restricted  population, 

yz 


the  corrected  values  of  8  will  still  be  an  underestimate  of  X  as 

yz  yz 

long  as  x  and  y  are  reasonably  correlated. 

A  number  of  studies  have  shown  that  the  assumptions  of  linearity, 

normality  and  homoscedasticit y  are  very  rarely  violated.  Sevier  (1957), 

using  sample  sizes  ranging  from  105  to  250,  has  shown  that  out  of  24  tests, 

only  one  violated  the  assumption  of  linearity,  and  one  out  of  eight  violated 

the  assumption  of  homoescedasticity.  Ghiselli  and  Ka  lineman  (1962)  examined 

60  aptitude  variables  in  a  sample  size  of  200  and  showed  that  40  percent  of 

the  variables  departed  significantly  from  the  linear-homoescedastic  model. 

However,  ninety  percent  of  these  variables  held  up  when  cross  validated. 

Tupes  (1964)  re-analyzed  the  Chiselli  and  Kahnenan  studies  and  found  that 

only  10  percent  of  these  relationships  departed  from  linearity  at  the  .05 

level . 

Schmidt  (197S)  has  pointed  out  that  the  assumptions  for  range  restrict¬ 
ion  correction  formulas  in  real  data  will  be  violated  only  infrequently. 
However,  even  if  they  are  violated  (e.g.,  lower  conditional  criterion 
variances  in  the  restricted  group),  there  is  every  reason  to  believe  that 
the  amount  of  induced  bias  will  be  small  relative  to  massive  bias  induced 

by  failure  to  correct.  This  point  is  Illustrated  by  re-examining  the  figure 

1 

abstracted  from  the  linn  (1968)  article  .  Campbell  (1976),  however,  review¬ 
ed  the  linn  article  and  came  to  the  same  conclusion  as  did  linn.  He  then 
further  emphasized  that  these  kinds  of  consideration  make  using  correction 


1.  As  noted  by  Schmidt  (’973),  tho  Linn  article  is  very  confusing  because  figure  2 
has  been  called  figure  1,  and  .'ice  versa.  However,  by  considering  the  example  on 
which  figure  1  is  based,  a  very  serious  violation  of  the  assumption  is  made.  In 
this  example,  restriction  has  been  on  a  third  variable  (X),  and  so  the  appropiate 
model  is  Thorndike's  (1949)  Case  ill  foroula.  However  variable  X  is  unknown  and 
thtr  the  correct  correction  formula  cannot  be  used.  The  incorre*t  Case  11  model 
is  used  instec.0,  which  assumes  direct  restriction  on  the  predictor. 
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formulas  rather  risky  business.  In  almost  every  real  situation  an  estimate  of 
the  population  parameter  will  most  likely  be  biased  in  some  respect,  except, 
in  clear  and  straightforward  situations. 

Criterion-Referenced  Tests 

Perhaps  a  better  way  to  insure  that  selection  systems  have  validity  is  to 
make  sure  that  che  test  is  an  adequate  sample  of  the  job  content.  Criterion- 
referenced  tests  although  less  sophisticated  than  norm-referenced  tests  provide 
better  samples  of  the  job  domain.  These  tests  are  designed  to  measure  performance 
relative  to  a  criterion  standard.  That  is,  an  employee 's  test  performance  is 
ccnpared  to  an  established  criterion  performance  standard.  Normative  data  as  bo 
how  well  one  employee  is  performing  relative  to  another  is  not.  necessary.  Hie 
job  is  usually  simulated  with  90%  accuracy,  so  that  the  performance  on  the  test 
is  basically  the  same  as  the  actual  performance  on  the  criterion.  Criterion- 
referenced  test  are  basically  the  same  as  performance  tests. 

Job  performance  tests  such  as  work  sanple  tests  are  especially  applicable 
as  use  for  criterion-referenced  measures  (Buck,  1975).  These  tests  like  cri¬ 
terion-referenced  tests,  must  sanple  the  job  domain  accurately  and  also  must 
measure  the  employee's  ability  to  perform  critical  job  tasks.  As  Buck:  (1975) 
has  pointed  out,  criterion  referenced  tests  are  often  confused  with  criterion 
related  tests.  A  criterion  related  test  inplies  that  there  is  seme  kind  of 
statistical  relationship  between  a  test  and  seme  measure  of  job  performance 
(i.e.,  supervisory  ratings,  self  ratings,  peer  ratings,  etc.).  While  on 
the  other  hand,  criterion-referenced  tests  refer  to  the  minimal  acceptable 
level  that  an  applicant  mist  meet  in  order  to  achieve  a  mastery  level  on  the  job. 
Essentially,  criterion-referenced  tests  are  used  to  determine  who  is  qualified, 
they  do  not  measure  variability  in  performance.  Popham  and  Husek  (1971)  contend 


that  criterion-referenced  tests  are  only  suitable  vh>en  there  is  no  constraints  on 
how  nany  people  are  to  be  selected  for  a  particular  job.  However,  »hen  there 
are  constraints,  norm-referenced  tests  are  usually  more  appreciate  for  selection. 

Like  norm-referenced  tests,  criterion-referenced  tests  can  also  he  used  when  the 
applicant  pool  is  exceptionally  large  by  setting  the  cutting  score  above  the  minimall 
qualifying  standard.  For  example,  if  the  minimally  qualifying  standard  cut  score 
is  set  at  70%,  it  can  be  set  at  90%  which  would  insure  the  adaptation  of  a  treat¬ 
ment  that  would  select  superior  eirployees.  The  advantages  of  this  aproach  of 
testing  over  norm-referenced  tests  is  the  accuracy  with  which  the  job  domain  is 
sampled  and  the  fact  that  less  complex  tests  can  be  developed  which  in  turn  would 
give  better  measures  of  validity. 
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APPENDIX  A 


Thomd  ike  ’  s  Case  I  f omul  a: 


2 

(1  -  rxy) 

y. 

The  Case  I  formula  is  primarily  concerned  with  the  correlation  between  x  and 
y.  Where  is  the  correlation  between  a  test  (x)  and  a  criterion  (y)  in 

the  unrestricted  group,  Sy  is  the  standard  deviation  of  the  criterion  in  the 
unrestricted  group,  sy  is  the  standard  deviation  of  the  criterion  in  the 
restricted  group. 


TSxxnd ike's  Case  11  formula: 


Sx 
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APPENDIX  A  (continued) 


A  special  case  ef  Case  III  when  the  validity  coefficient  is  known  for  the  unrestricted 
group: 


R 

xy  * 


The  terms  are  define  in  the  Case  I  formula. 


APPENDIX  A  (continued) 


Thorndike’s  Case  III  formula: 


^  (jr-  r1 


These  symbols  are  analogous  to  the  Case  II  formula,  with  the  except¬ 
ion  of  the  third  variable  being  correlated.  R  is  the  correlation 

xy 

between  the  explicit  variable  (teat)  and  the  implicit  variable  (a  new 
research  test)  for  the  unrestricted  group,  r  ia  the  correlation  between 

*y 

the  explicit  variable  and  the  implicit  variable  for  the  restricted  group, 

r  is  the  correlation  between  test  X  and  the  criterion  Z  for  the  restrict- 
xz 

ed  group,  r  is  the  correlation  between  implicit  variable  (new  research 

y* 

test)  and  the  implicit  variable  (criterion  Z  for  the  restricted  group, 

2  2 

and  S  and  s  are  the  unrestricted  and  restricted  group  variances  for 
z  z 

the  criterion  Z. 
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of  Readiness  for  Reenlistawnt 
Federal  Armed  Forces 


Adelheid  Meissner  and  Klaus  J.  Puzicha 
Federal  Armed  Forces  Office 
(Germany) 


Summary 

in  december  1980  a  representative  sample  of  soldiers  of  all 
three  forces  of  the  German  Federal  Armed  Forces  (ranks: 
private  to  sergeant)  was  questlonned.  Topic  of  this  research 
work  was  the  analysis  of  determinants  of  job  satisfaction  and 
readiness  for  reenlistment.  Controlling  for  the  variables 
‘force"  and  "status"  (compulsory  condition  vs.  enlistment  for 
two  years,  vs.  enlistment  for  three  or  four  years  vs.  enlistment 
for  five  years  or  longer)  we  have  analysed  the  most  Important 
determinants  by  multiple  regression. 

The  main  determinants  of  job  satisfaction  and  readiness  for 
reenlistment  are:  affinity  for  military  concerns,  perception 
Of  the  military  superiors  and  the  conditions  of  work.  Beadiness 
for  reenlistment  in  particular  has  additional  social  determinants! 
the  attitude  of  the  wife  respectively  the  girlfriend  and  personal 
contacts  with  the  union-leader. 


PREVIOUS  PAGE 

IS 

IS  BLANK 

821 


1  Problem 

The  German  Federal  Armed  Forces  contain  of  about  500.000  soldiers. 
Corresponding  to  a  NATO- agreement  it  is  demanded  that  55  per  cent 
of  them  should  be  enlisted  men.  But  in  fact  at  present  there 
are  more  than  50  per  cent  draftees  who  serve  with  the  Armed  Forces. 
This  overhang  compensates  a  lack  of  corporals:  So  at  last  the 
soldier  under  compulsory  conditions  has  to  undertake  the  function 
of  a  corporal.  (1) 

This  was  the  situation  till  the  year  197B.  Meanwhile  the  personnel 
development  hat  changed: 

-  In  the  Army  and  in  the  Navy  there  is  a  ahortage  of  middle-  and 
long-term  volunteers,  i.e.  of  soldiers  who  enlisted  for  three 
and  more  years.  In  these  two  forces  the  lack  can  be  compensated 
by  increasing  quote  of  short-term  volunteers  (enlistment  for 
two  years).  But  this  compensation  is  only  a  quantitative  one; 
there  still  remains  the  problem  that  even  young  soldiers  with 
short  enlistment  are  overtaxed. 

-  The  personnel  situation  in  the  Air  Force  is  characterized  by 

a  considerable  deficiency  of  short-term  enlisted  men  (2  years) 
and  long-term  volunteers  (8  years  and  longer)  on  the  one  hand  and 
an  overhang  of  soldiers  with  medium  enlistment  (3  to  7  years) 
on  the  other  hand. 

One  problem  of  the  German  Armed  Forces  in  the  eighties  is  how  to 
fit  the  needs  of  enlisted  men  in  e  satisfactory  way.  Principally 
there  might  be  three  possibilities: 

-  enlistment  of  volunteers 

-  change  of  status  from  draftee  to  volunteer 

-  reenlistment  of  voluntary  soldiers. 

During  the  lest  years  the  requirements  of  volunteers  are  covered 
by  the  three  mentiond  facilities  to  e  thlred  each.  (2) 
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The  present  research  concerns  with  the  problem  of  deficient 
readiness  for  reenlistnent  respectively  enlistment  by  draftees. 
Both  aspects  we  have  susmarlxed  as  READINESS  FOR  REZNLISTKENT. 
we  consider  this  problea  as  an  Indicator  of  JOB  SATISFACTION 
which  was  the  other  stain  subject  of  our  Investigation. 

The  following  survey  shows  the  design  we  have  applied  to  analyse 
these  two  aspects. 


DEPENDENT  VARIABLES 
(Criteria) 

relevant  for  the  indivlduua  relevant  for  the  institution 

satisfaction  with. . . 

-  ... superiors 

-  ...conditions  of  work 

-  ... comrades 

-  . . .payment 

INDEPENDENT  VARIABLES 
(Predictors) 

more  subjective  ones 

-  personality  traits 

-  perception  of  the  superiors 

-  perception  of  the  conditions  of 
work 

-  perception  of  the  oomrades 

CONTROLLING  VARIABLES 

-  forces  (Army  vs.  Airforce  vs.  Navy) 

-  status  (service  under  compulsory  conditions  vs.  snllstment 
for  two  yeers  vs.  enlistment  for  three  or  four  year*  vs. 
enlistment  for  five  yeers  or  longer) 


more  objective  ones 

-  biographic  data 

-  objective  aspects  of  the 
military  job 


-  readiness  for  resnllstaent 

-  efficiency  as  multiplicetors 

-  extrinsic  notlvatiblllty 


2  Methods 

At  the  end  of  1980  an  Investigation  concerning  the  Motioned 
problems  wee  carried  out  by  the  German  Armed  Forces’  Psychological 
Research  Institute  Bonn.  On  the  basis  of  randomised  sample 
selection  1500  soldiers  were  interrogated  pos tally  by  a  pretested 
questionnaire.  The  sample  was  representative  for  the  ranks 
from  private  to  sergeant  -  containing  of  both  draftees  and 
volunteers. 


[Description  of  sample 


Army 

Alrforc* 

Navy 

servicemen 

643 

436 

152 

corporals 

85 

34 

sergeants 

121 

J* 

116 

The  quota  of  recoil  was  about  42  per  cent. 

“Because  of  a  technical  error  in  the  Airforce  there  were  only 
servicemen  available. 


To  analyse  the  detenainants  of  job  satisfaction  respectively 
readiness  for  reenlistMnt  araitlvarlate  methods  such  as  multiple 
regression  analysis  seeMd  adequate.  For  criteria  (that  Mans  our 
dependent  variables)  we  have  taken  the  following  dimension  which 
we  have  got  by  factor  analysis  of  som  salaetsd  measuring  instru¬ 
ments : 

-  Satisfaction  with  the  superiors 

-  Satisfaction  with  tha  condition!  of  work 

-  Satisfaction  with  tha  comrades 

-  Satisfaction  with  the  payment 

-  Readiness  for  reenlistMnt 

-  Efficiency  as  multipllcators 

-  Extrinsic  motlvatibllity 


824 


Sow  of  these  dependent  variables  have  been  need  for  predictors 
too.  In  the  following  all  applied  predictors  are  specified. 


1  Rank 

2  Seniority 

3  Futural  ties  of  service 

4  Living  in  barracks 

5  Ties  of  reversion 

6  Shifting 

objective  aspects  of 
the  military  job 

7  Level  of  education 

8  Failing  remove 

9  Confession 

10  Family  status  (married) 

11  Distance  from  wife/girlfriend 

12  Attitude  of  vif e/girlfriend 
towards  reenlistment  in  the 

Armed  Forces 

13  Unemployment  of  colleagues  or  friends 

14  Satisfaction  with  the  educational 
and  professional  level 

objective  aspects  of 
biography 

15  Satisfaction  with  the  superiors 

16  Authoritarian  superiors 

17  Number  of  contacts  with  the  union 
leader 

18  Foreign-determination 

perception  of  the 
superiors 

19  Satisfaction  with  the  conditions  of  work 

20  Adequate  function  perception  of  the 

21  Burden  by  speclel  missions  and  conditions  of  work 

long-term  working  hours 

22  Idleness 

23  Satisfaction  with  the  coeoades 

24  Alcohol  and  missing  cosKadeshlp 

25  Alcohol  consumption 

perception  of  the 
comrades 

26  Satisfaction  with  the  payment 

- 

27  Expectations  before  joining  the  Armed 

28  Prejudices  towards  the  Armed  Forces 

29  Affinity  for  military  oonoarns 

30  Unpolitical  attitude 

31  Readiness  for  leader  ah  ip/ dominance 

32  Need  for  achievement 

33  Immobile  passivity 

Forces 

personality  traits 

3  Results 


In  a  first  descriptive  view  of  results  It  seems  opportune  to 
give  a  kind  of  Inventory,  i.e.  each  questions  as  "which  groups 
of  soldiers  are  ready  to  reeallst,  which  are  not?"  or  "Row  can 
one  improve  the  readiness  for  reenlistment?" 


Readiness  for 

reenliatment  in 

several  groups 

Groups 

"I  want  to 

become 

professional 

"I  will 
reeallst" 

• 

*1  am  still 
Indecisive" 

"1  will  not 
reeallst" 

Arsy 

t 

4 

5 

14 

77 

Air  force 

« 

2 

4 

11 

83 

Navy 

% 

12 

4 

18 

66 

servicemen 

« 

2 

4 

12 

83 

corporals 

« 

5 

11 

25 

59 

sergeants 

% 

17 

7 

16 

58 

draftees 

« 

1 

3 

7 

90 

enlisted  men 

t 

1 

5 

20 

75 

|for  2  years 

jenliated  man 

« 

6 

9 

29 

56 

for  3  or  4 

years 

Unllited  sen 

t 

24 

7 

19 

50 

for  5  ysars 
or  longer 

Four  of  five  Air  force  soldiers,  three  of  four  Army  soldiers 
and  two  of  three  Navy  soldiers  don't  vj>nt  to  extend  their  time 
of  service  voluntarily. 

Soldiers  with  a  higher  status  show  store  resdlness  for  reenlistment 
or  for  becoming  professional. 

Our  results  made  clear  that  there  are  considerable  differences 
between  soldiers  of  the  three  forces  with  rsgsrd  to  job  sstla- 
faction  and  readiness  for  reenlistment.  Likewise  the  results 
for  the  various  status  groups  were  not  cosparable.  Therefore  we 


have  Mde  all  regression  analyses  separate  for  Army.  Air  Force 
and  Navy  aa  well  as  for  the  groups: 

-  service  under  coepuleory  conditions  (draftees) 

-  enlistment  for  two  years 

■  enlistment  for  three  or  four  years 

-  enlistment  for  five  years  or  longer 

Though  we  cannot  present  all  these  single  results  here  we  will 
try  to  select  aoam  characteristic  tendencies .  The  determinants 
of  readiness  for  reenlistment  are  shown  at  the  following  diagrams. 
The  hatched  parts  of  the  columns  represent  the  portion  of  not 
explainable  variance.  The  ntabera  stand  for  the  verioue  predictors. 
Some  important  ones  ere  specified  her* : 

29  affinity  for  military  concerns 

12  attitude  of  the  female  partner  towards  reenlistment 
17  number  of  contacts  with  the  union  leader 
3  futural  time  of  service  (the  more  distent  the  term  of  e 

decision  -  reenlistment  or  not  -  the  greeter  the  Inclination 
to  do  It) 

2  seniority 

28  prejudices  towards  the  Armed  Forces 

Ne  can  see  that  for  draftees  end  for  short-term  volunteers 
affinity  for  military  concerns  (29)  and  attitude  of  the 
girlfriends  (12)  ere  most  relevant.  For  the  sub a ample*  of  long¬ 
term  volunteers  attitude  of  the  female  partner  (12) ,  contacts 
with  the  union  leader  (17),  prejudices  towards  the  Armed  Forces  (28) 
occupy  the  first  ranks. 

In  e  further  step  we  have  suisriied  the  tea  most  important 
predictors  of  the  three  criteria  concerning  reenlistment. 


Average  relevance  of  the  predictors  for  the  three  criteria 
READINESS  FOR  REEMLISlKEin,  EFFICIENCY  AS  KOLTIPLICATORS  and 
EXTRINSIC  NOTIVATiaiLITY 


1  Affinity  for  military  concerns 

2  Prejudices  towards  the  Armed  Forces 

3  Attitude  of  wife/girlfriend  towards  reenlistment 

4  Number  of  contacts  with  the  union  leader 

5  Satisfaction  with  the  conditions  of  the  work 

6  Satisfaction  with  the  superiors 

7  Adequate  function 

8  Distance  from  wife/girlfriend 

9  Satisfaction  with  the  payment 
10  Seniority 

Corresponding  to  the  high  portion  of  conscripts  in  our  saaqple  the 
emotional  dimensions  'affinity  for  military  concerns*  and 
'prejudices  towards  the  Armed  Forces*  have  the  a»st  important 
relevance.  Next  ranks  are  occupied  with  social  lnfluancaa. 

They  show  the  Importance  of  reference  groups  for  reenlistment . 

The  attitude  of  the  female  partner  and  the  frequency  of  contacts 
with  the  union  leader  determine  the  decision  in  favour  of  or 
against  s  reenlistment. 

In  comparison  wlut  an  amarlcan  study  (3)  ws  have  analysed  how  far 
the  readiness  for  raenllatment  is  dependent  on  some  certain 
material  stimuli.  We  have  called  this  'extrinsic  motlvatiblllty* . 

To  measure  this  nine  more  or  less  resllstlc  stimuli  ss  potential 
motivation  for  reenliitment  were  given  to  the  soldiers.  If  only 
ssklng  for  their  general  attitude  72  per  cent  of  them  spontaneously 
say  that  they  will  not  reenllst.  But  confronting  them  with  these 
'extrinsic*  incentives  there  are  only  still  14  per  cent  left,  who 
would  reenllst  by  no  means.  In  other  words:  SB  per  cent  of  all 
soldiers  were  ready  to  revise  their  original  intention:  they  were 
more  ov  lesa  venal. 

Yet  not  all  nine  stimuli  are  likewise  effective  for  improving 
the  readiness  for  reenlistment. 
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What  kind  of  conditions  promise  to  be  most  effective? 


without  regard  to  realisation; 

more  favorable  leisure  and  compensation  of  overtime  rank  1 

option  of  garrison  and  co-determination  of  resttval  rank  3* 

certain  preferment  rank  3* 

etter  payment  rank  3* 

guarantee  of  garrison  near  residence  rank  5 

bligatory  promises  of  development  rank  6 


1th  consideration  to  the  most  realistic  measures 

bligatory  promises  of  devalopeient  rank  1 

ertain  preferment  rank  2 

ore  favorable  leisure  end  compensation  of  overtime  rank  3.5 

ayment  of  an  attractive  enlistment -premium  rank  3.5 


'hears  conditions  occasionally  have  the  same  rank 


By  suitabla  combination  of  the  practicable  measures  BO  per  cent 
of  all  soldiers  could  be  addressed. 


4  Summary  and  Discussion 

As  ve  assume  a  closa  connection  between  job  satisfaction  end 
readiness  for  reenliatment  we  have  eusessrlsed  all  seven  criteria 
end  investigated  what  kind  of  predictors  are  most  significant. 
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He  can  see  that  altogether  "affinity  for  military  concerns" 
represents  the  most  important  determinant,  that  naans  the  attitude 
towards  the  typical  aspects  of  military  jobs  as  living  In  barracks, 
wearing  uniform,  and  living  with  the  principle  of  command  and 
obedience.  The  significance  of  this  emotional  aspect  concerning 
the  nearness  or  distance  to  the  Federal  Armed  Forces  decreases 
with  the  duration  of  enlistment. 

The  next  important  factor  of  Influence  Is  the  satisfaction  with 
the  superiors.  That  makes  clear  once  more  the  necessity  of  good 
leadership  behavior.  That  la  what  the  German  Federal  Armed  Forces 
call  "leadership  and  civic  education". 

Returning  to  our  first  msntloned  problem  «  i.e.  a  low  rate  of 
reenlistaenta  of  volunteers  respectively  of  enlistments  by  drsftess  - 
we  may  ask  wnether  the  results  of  our  investigation  give  any  hints 
for  lsiproving  that  situation. 

He  have  found  out  that  an  lncreaae  of  readiness  for  reenlistsMnt 
stay  bn  realised  by  certain  "extrinsic"  stimuli.  Even  soldiers 
who  actually  are  against  any  rsenlistnsnt  would  rslast  if  offered 
such  permissions.  Supposing  these  stimuli  wars  practicable  -  woul<J 
such  a  "venality"  be  deslrabls  if  there  are  no  positive  emotional 
attitudes  of  aoldlers  towards  the  Armed  Forces?  From  our  point  of 
view  it  is  quit*  possible  that  attitudes  may  change  and  develop  so 
that  they  are  in  accordance  with  the  decision  that  has  baen  made 
before.  That  means:  after  one  has  decided  to  reenllst  he  may  look 
at  the  military  aspects  in  a  more  positive  way  than  he  had  done 
before;  his  affinity  far  military  concerns  incraaaaa. 
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The  principal  effects  of  inflation  on  testing  of  new  weapon  systems 
are  to  compress  schedules  and  thereby  reduce  the  range  and  scope  of  system 
performance  which  can  be  assessed*  Test  planners  today  must  analyze  system 
performance  goals  and  system  design  and  then  predict  those  few  areas  where 
testing  is  most  likely  to  pay  off.  Testing  conducted  only  in  accordance 
with  these  predictions  can  fail  to  detect  significant  problems  which  will 
show  up  when  the  system  is  eventually  fielded.  A  new  approach  to  con¬ 
ducting  task  analysis  may  provide  better  indication  to  test  designers  of 
where  to  anticipate  problems  and,  hence,  where  to  apply  scarce  testing 
resources.  This  approach  was  developed  by  a  tri-service  committee  of  human 
factors  practitioners,  and  its  concepts  won  an  80Z  Indorsement  of  other 
practitioners  in  government  and  industry  who  responded  to  a  questionnaire.  ,<  , 


TASK  ANALYSIS  FOR  WEAPONS  SYSTEMS  TESTERS: 
SHORTCUT  TO  PAYDIRT  IN  INFLATIONARY  TIMES 


John  L.  Miles,  Jr.,  J.O. 
United  States  Army  Research  Institute 
for  the  Behavioral  and  Social  Sciences 
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While  the  battles  of  consumers  against  Inflation  may  gain  the  most 
media  attention,  the  pernicious  effects  ot  Inflation  are  felt  as  well  by 
weapon  system  testers.  Most  frequently  these  effects  are  felt  on  test 
schedules,  In  which  such  budget  contralnts  as  "number  ot  rounds  ot  amm¬ 
unition  available"  dictate  shorter  end  smaller  tests.  Test  planners  are 
thus  faced  with  the  dilemma  of  whether  to  reduce  the  sample  size  on  which 
Important  generalizations  are  to  be  based  or  to  eliminate  whole  subtests, 
when  this  dilemma  Is  resolved  In  favor  of  the  statisticians,  on#  of  the 
first  subtests  to  be  cancelled  Is  often  thot  of  human  factors.  Historical 
grounds  for  this  decision  appear  to  Include  the  perceptions  that:  (1)  no 
*  Army  system  has  ever  been  cancelled  solely  for  human  factors  reasons,  (2) 
the  legendary  Ingenuity  of  the  American  soldier  will  enable  him  to  overcome 
or  at  least  compensate  for  disadvantageous  design  of  equipment,  and  (3)  If 
the  electrical  and  mechanical  subsystems  con  be  made  to  operate  with 
satisfactory  reliability,  the  whole  system  Is  probably  good  enough  to  go  to 
the  field. 

There  Is  growing  evidence  today  that  those  grounds  are  being  under¬ 
mined.  Human  factors  deficiencies  were  prominent  among  the  reasons  for 
Congressional  cancellation  of  the  Family  of  Military  Engineering  Construc¬ 
tion  Equipment  (FAMECE)  project,  and  the  prestlgous  Karwln  and  Blanchard 
report  Man-Machine  Interface  -  A  Growing  Crisis  began,  "The  US  Army  has  a 
major  man-machine  Interface  problem...  The  problem  Is  severe  and  will 
continue  to  get  worse"  (p.  1),  Thus,  at  a  time  when  Inflation  Is  exposing 

testing  In  general  and  human  factors  testing  In  particular  to  severe  cuts 
or  elimination  altogether,  the  criticality  of  the  men-machlne  Interface  Is 
Increasing.  Prudent  test  planners  are  today  therefore  casting  about  for 
mora  efficient  means  of  determining  which  areas  of  system  testing  are 
likely  to  have  the  highest  payoff  In  terms  of  testing  resources  Invested. 
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Human  factors  proct 1 1 1  oner s  (particularly  within  the  Army)  have  pro¬ 
vided  ample  guidance; 

(1)  A  general  life  cycle  system  management  model  showing  the  Inte¬ 
gration  of  human  factors  and  Identifying  test  and  evaluation  activities  was 
published  In  1980  (Burt,  et.  al.,  1980). 

(2)  A  detailed  guide  for  gathering  and  analyzing  human  performance 
data  during  developmental  testing  was  Issued  by  the  USA  Human  Engineering 
Laboratory  In  1976  (Berson  and  Crooks,  1976). 

(3)  A  workbook  and  a  handbook  for  assessing  human  performance  during 
operational  testing  (known  as  HRTES)  were  developed  by  the  US  Army  Research 
Institute  (Kaplan,  et.  at.  1980). 

Although  each  of  those  documents  has  been  well  received  In  the  profes¬ 
sional  community,  there  remains  one  persistent  problem  whose  solution  has 
eluded  the  human  factors  and  training  community  for  years  and  Is  a  neces¬ 
sary  prerequisite  to  efficient  determination  of  high  payoff  areas  for 
Investment  of  testing  resources.  That  problem  Is  determining  and  then 
documenting  what  the  humans  In  tho  system  have  tc  do  (and,  therefore,  what 
training  they  have  to  receive)  to  make  It  work  properly.  The  means  most 
often  used  for  this  undertaking  Is  Task  analysis. 

Task  analysis  Is  of  course  not  new,  and  Its  origins  may  lie  with 
the  1898  work  of  Frederick  Taylor  at  the  Midvale  Steel  Company.  A  paper 
by  Hays  (In  press)  contains  a  recent  review  of  the  status  of  task  analysts, 
and  another  by  Berry  (1979)  contains  a  succinct  summary  of  the  problem: 

Modeling  of  a  h u m a n - ma c h I n e  system  for  whatever 
purpose,  requires  that  all  significant  events  occurring 
In  the  system  be  described.  Task  analysis,  standard  In 
the  repertoire  of  the  human  factors  engineer.  Is  the 
technique  generally  used  to  describe  the  activities 
performed  by  the  human  components,  or  operators.  Un¬ 
fortunately,  there  Is  no  agreement  on  the  vocabulary  or 
structure  to  be  used  In  making  these  descriptions. 

Within  the  pest  25  years,  at  least  a  dozen  task 
classification  schemes,  or  taxonomies,  hove  been  pro¬ 
posed.  Even  definition  of  the  term  Task  Is  not  uni¬ 
versally  agreed  upon,  and  this  definition  Is  critical 
because  It  strongly  Influences  the  terms,  units  and 
general  flavor  of  the  final  taxonomy  (p.  1). 

The  difficulties  with  task  analysis  are  well-known  to  developers 
of  military  systems.  "Happy  hour"  conversations  often  contain  stories  of 
defective  task  analyses,  but  virtue  alone  Is  not  enough:  during  the  late 
1960s  and  early  1970s  there  was  a  Joint  German-Amer lean  development  program 
for  a  main  battle  tank.  The  tank  analysis  for  that  system  when  del  Ivered 
was  32-Inches  thick  and  no  one  was  able  to  use  It  (Brogan  et.  al,  1981, 
p.  262).  Attempts  were  begun  In  the  late  1970s  In  the  Department  of. 
Defense  to  solve  the  problem  of  tesk  analysis.  As  General  Becton  explained 
the  problem  to  the  Army's  Vice  Chief  of  Staff, 
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A  proper  basis  for  both  human  engineering  and  human 
factors  to  pursue  their  goals  is  a  task  analysis.  This 
Is  an  analysis  of  those  specific  tasks  an  operator  must 
perform  to  operate  the  system.  A  valid  task  analysis 
provides  a  logical  basis  for  human  engineering  design, 
training  program  development,  training  device  require¬ 
ments,  and  other  management  considerations  such  as  SQTs 
and  MOS  prerequisites.  Too  often  In  the  development  of 
Army  equipment  systems,  task  analyses  are  not  done  or  are 
done  Incompletely.  This  is  a  primarily  becausa  no 
military  standard  currently  exists  for  task  analysis,  the 
users  of  task  analysis  Information  have  different  re¬ 
quirements,  multiple  formats  are  used,  and  the  task 
analysis  must  be  called  out  as  a  deliverable  data  Item  to 
be  performed  (pp.  1-2). 

In  General  Becton's  view,  what  was  needed  was  a  military  solution  to  a 
technological  problem. 

An  Intrepid  group  of  military  and  civilian  trainers,  testers  and 
human  factors  specialists  drawn  from  all  three  armed  services  began 
work  on  the  problem  In  late  197**.  By  June  of  1979  the  group  (officially 
designated  as  the  Test  end  Evaluation  Subgroup  of  the  Department  of 
Defense  Human  Factors  Engineering  Technical  Advisory  Group,  but  more 
commonly  known  as  the  "T&E  SubTAG")  had  reviewed  the  "ask  analysis  programs 
In  all  three  services,  developed  Its  own  task  taxonomy  (sec  Figure  1), 


Mission:  What  the  man-machine  system  Is  supposed  to  accomplish. 

Scenario/conditions:  Categories  of  particular  factors  or  con¬ 

straints  undar  which  the  system  will  be  expected  to  operate  and  be 
ma I nta I ned . 

Function:  A  broad  category  of  activity  performed  by  a  man-machine 

system. 

Job :  The  combination  of  all  human  performance  required  for  oper¬ 

ation  and  maintenance  of  one  personnel  position  In  a  system. 

Duty:  A  set  of  operationally-related  tasks  within  a  given  Job, 

Task:  A  composite  of  related  activities  (perceptions,  decisions, 

and  responses)  performed  for  an  Immediate  purpose,  written  In 
o pe r ator/ma I n ta I n er  language. 

Subtask :  Activities  (perceptions,  ueclsions  and  responses)  which 

fulfill  a  portion  of  the  Immediate  purpose  within  a  task. 

Task  Element:  The  smallest  logically  and  reasonably  definable  unit 
of  behavior  required  In  completing  a  task  or  sub-task. 


Figure  1.  Task  Taxonomy  from  Proposed  Military  Standard 


i 
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and  outlined  a  proposed  military  standard  on  task  analysis  (Miles,  1979). 
A  questionnaire  was  subsequently  developed  at  Edwards  AFB,  California  and 
sent  by  name  to  over  a  hundred  pract 1 1  loner  s  of  human  factors  and  training 
in  government  and  Industry  to  obtain  their  reactions  to  both  the  proposed 
taxonomy  and  the  Idea  of  having  a  m I  I  I tary  standard  serve  as  arbitrament  In 
this  area.  To  the  utter  astonishment  of  the  subTAG,  more  than  80$  of  the 
survey  respondents  agreeo  with  the  proposal  (which  may  be  ar.  Indication 
that  people  other  than  Army  generals  are  tired  of  chaos  In  this  area). 
Some  minor  revisions  were  then  made  to  the  conceptual  scheme,  and  the  draft 
military  standard  was  prepared  (Zavala,  1960). 

The  drafters  of  the  standard  had  two  Innovative  goals.  First,  they 
wanted  to  use  the  same  task  data  base  for  ell  of  the  specialty  programs 
(design,  training,  test  and  evaluation,  manning  and  workload)  which  tradi¬ 
tionally  have  required  task  analysis.  This  was  both  for  purposes  of 
economy  (It's  hard  enough  to  get  a  project  manager  to  buy  one  task 
analysis  --  let  alone  five)  as  well  as  the  promotion  of  coordination  among 
the  various  specialists  concerned  with  aspects  of  personnel  and  training. 
To  accomplish  that  goal,  they  established  two  resevolrs  of  data  --  "task 
Invenfory"  which  is  rigidly  controlled  by  the  task  taxonomy,  and  "sup¬ 
porting  data"  which  Is  everything  else  In  whatever  format  that  may  be 
needed  to  Insure  the  accuracy  or  validity  of  the  task  analysis.  Second, 
they  tried  to  reach  a  new  height  of  specificity  and  flexibility  --  the 
former  so  that  a  contractor  would  know  with  precision  In  every  case  what 
the  desired  task  analysis  should  look  like,  and  the  latter  to  give  the 
government  maximum  freedom  In  terms  of  level  of  detail  wh I  1  a  still  pre¬ 
serving  all  of  the  controls  Inherent  In  the  proposed  standard.  This  was 
accomplished  In  two  primary  ways:  (I)  by  requiring  that  every  task  anal¬ 
ysis  Include  two  specific  levels  In  the  task  Inventory  ("Job"  and  "task") 
and  (2)  by  prescribing  output  format  but  not  process.  It  was  reasoned 
that,  with  these  requirements,  both  gross  and  detailed  task  analyses  could 
be  obtained  from  the  same  data  base  for  the  same  system  (the  letter  by 
adding  more  of  the  optional  levels  of  the  taxonomy)  and  that  two  entirely 
different  systems  (e.g.,  a  rifle  and  a  Jet  aircraft)  could  still  use  the 
same  conceptual  modal  of  task  analysis.  A  schematic  of  this  model  Is  shown 
|n  Figure  2.  In  this  model  (and  In  the  draft  standard)  both  the  Input  (at 
least  tha  task  Inventory  portion)  and  the  output  (shown  as  "Data  Item 
Descriptions"  on  DD  Forms  1664)  ere  carefully  prescribed;  the  process 
called  "task  analysis"  is  not.  Therefore  the  government  planner  may  direct 
that  the  contractor  use  anything  from  a  stubby-pencil  manual  method  to  one 
of  the  sophisticated  ADP-assIsted  methods  such  as  MOAT  (saa  Helm  and 
Donnell,  1979)  --  or  even  some  new  technique  Invented  after  the  standard 
was  written. 

Use  of  this  standard  should  be  particularly  helpful  to  testers  of 
military  systems.  Among  the  recommendations  of  Generals  Kerwlo  and 
Blanchard  was  "Manpower  and  skill  level  specifications  and  human  perfor¬ 
mance  requirements  must  be  developed,  stated  and  used  to  the  same  degree 
materiel  specifications  have  been  In  the  past"  (Kerwln  and  Blanchard, 
1980,  p.  7).  Means  for  implementing  this  recommendation  wera  proposed 
by  Kaplan  and  Crooks  (1960)  based  on  Integrating  the  draft  task  analysis 
standard  with  their  earlier  efforts  on  KRTES.  In  those  projects  where  that 
recommendation  Is  In  fact  Implemented,  objective,  verifiable  human  perfor¬ 
mance  criteria  will  exist  In  requirements  documents  which  can  easily  be 
translated  Into  test  design  plans  end  then  Into  detailed  test  plans. 


INPUT 


PROCESS  OUTPUT 


FIGURE  2.  SCHEMATIC  OF  TASK  ANALYSIS. 


Summary 


The  draft  military  standard  on  task  analysis  was  created  to  bring 
h  ith  order  and  standardization  to  th*  process  of  describing  and  documenting 
*  .at  the  humans  In  a  military  system  are  required  to  do  to  make  It  function 
pr-perly.  Its  use  permits  testers  of  military  systems  to  Identify  quickly 
those  human  performance  criteria  considered  of  primary  Importance  In 
obtaining  the  forecast  level  of  system  effectiveness. 
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'I 

USAF  Behavioral  Scientists  (AFS  267X)  are  a  relatively  small  operational 
group  who  perform  very  diverse  jobs,  primarily  in  Air  Training  Command, 
Air  Force  Systems  Command,  and  the  USAF  Academy,  Currently,  there  are 
about  140  officer  positions  authorized  in  this  specialty  which  range  in 
grade  from  Second  Lieutenant  to  Lieutenant  Colonel.  An  occupational 
survey  of  Air  Force  behavioral  scientists  and  related  jobs  was  conducted 
during  the  summer  of  1981  to  assess  the  types  of  jobs  being  performed, 
organization  of  the  specialty,  career  patterns,  job  interest  and  satis¬ 
faction,  and  educational  requirements  for  the  various  types  of  jobs. 

The  occupational  data  developed  in  this  study  will  be  used  by  Air  Force 
Manpower  and  Personnel  Center  personnel  classification  officials  to 
determine  if  aubspecialty  designators  (shredouts)  are  needed  and  to 
refine  the  current  specialty  description  (AFR  36-1),  The  data  will  also 
be  available  for  use  in  the  behavioral  scientist  career  development 
program.  This  paper  will  report  the  initial  analysis  of  the  occupa¬ 
tional  data  focusing  primarily  on  the  types  of  jobs  performed  by  members 
of  the  specialty.  A  more  complete  report  of  the  behavioral  scientist 
occupational  analysis  project  should  be  available  in  early  1982.  j\ 


INTRODUCTION 


The  United  States  Air  Force  (USAF)  has  a  long  history  of  successful  behavioral 
science  research.  Many  of  the  diverse  lines  of  research  grew  out  of  World  War  II 
efforts,  such  as  the  pilot  selection  program  in  San  Antonio,  and  the  emerging 
field  of  human  factors  engineering  in  weapons  system  design  pioneered  at  the 
laboratories  at  Wright-Patterson  AFB.  Through  the  years,  some  of  the  best  known 
names  in  psychology  have  been  part  of  the  USAF  behavioral  science  research  pro¬ 
gram,  either  as  civilian  employees  or  while  they  were  military  officers.  A 
substantial  number  of  these  behavioral  scientists  have  gone  on  to  make  major 
contributions  in  the  academic  and  applied  psychology  areas.  Most  have 
acknowledged  the  value  of  their  .Air  Force  experience,  but  some  (cf.  Jacoby  1970) 
have  been  critical  of  how  the  Air  Force  made  use  of  their  talents  and  abilities. 
Many  of  the  lines  of  research  and  the  operational  programs  which  have  been 
developed  in  the  USAF  research  program  require  an  on-going  supply  of  highly 
qualified  civilian  and  officer  behavioral  scientists.  The  present  study 
represents  an  attempt  to  help  define  the  jobs  of  military  behavioral  scientists 
and  to  examine  the  diversity  of  positions  (Driskill  and  Mitchell  1980)  within 
the  utilization  field.  Since  one  of  the  most  successful  operational  programs 
developed  by  the  USAF  has  been  the  task-based  occupational  analysis  system 
(Morsh  1964;  Christal  1974),  it  is  particularly  fitting  to  use  the  Air  Force 
CODAP  job  analysis  technology  to  study  Air  Force  behavioral  scientist  officer 
jobs. 


BACKGROUND 


The  present  study  emerged  out  of  a  number  of  concerns  as  to  how  the  USAF 
behavioral  scientist  utilization  field  (AFS  267X)  is  organized  and  how  the 
various  subspecialties  are  identified  and  controlled.  In  the  initial  Air  Force 
officer  classification  scheme  of  1954,  the  field  involved  four  specialty  codes — 
cne  for  Research  and  Development  Officer,  and  three  for  Human  Resources  Research 
(see  Figure  I).  Between  1964  and  1976,  the  AFS  267X  area  was  structured  into 
one  specialty,  with  four  subspecialty  designators:  A  *  Human  Factors  Psychologist; 
B  =  Experimental  Psychologist;  C  =  Personnel  Measurement  Psychologist;  and 
Z  =  Other  social  scientists  (including  sociologists,  physical  anthropologists, 
cultural  anthropologists,  etc.).  In  1976  these  shredouts  were  dropped,  leaving 
one  basic  AFSC.  The  staff  level  specialty  (AFS  2616)  was  continued  as  a 
separate  code,  and  grouped  with  other  types  of  Air  Force  scientists,  such  as 
physicists,  chemists,  mathematicians,  etc. 

The  authorized  grade  spread  of  the  267X  field  is  2Lt  through  Lt.  Colonel. 

Those  selected  for  Colonel  must  give  up  the  267X  designation  and  become  either 
a  Staff  Scientist  (AFS  2616)  or  take  another  0-6  level  code  (such  as  Organization 
Commander  0026,  or  Plans  &  Programs  Officer  0076).  Colonels  with  a  behavioral 
science  background  can  be  assigned  to  a  variety  of  0-6  positions,  and  any  0-6  can 
be  assigned  to  the  key  leadership  positions  in  behavioral  science  organizations, 
regardless  of  their  technical  backgrounds.  This  policy  is  quite  consistent 
with  the  Air  Force  "Whole  Man"  concept  of  officership  in  the  Line  officer 
corps  (versus  technical  qualification  and  rank  progression  in  specialty  corps 
such  ns  the  medical,  dental,  legal,  chaplain,  and  other  areas). 
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In  1977,  the  Air  Force  personnel  system  developed  Special  Experience 
Identifier  (SEI)  codes  which  were  used  to  Identify  and  track  high-value 
individuals,  who  might  be  needed  for  special  assignments.  Since  the  subspecialty 
designators  ("shredouts")  for  behavioral  scientists  had  been  deleted  in  1976 
to  provide  more  assignment  flexibility,  it  was  thought  that  this  new  SEI  system 
could  serve  the  same  purpose,  without  the  assignment  limitations  inherent  in 
in  AFSC  shredouts. 

During  1977  and  1978,  a  number  of  complaints  were  made  about  the  lack  of 
shredouts,  as  individuals  who  did  not  have  the  special  backgrounds  needed  to 
perform  specific  jobs  were  assigned  to  various  organizations.  The  Human  Factors 
positions  were  particularly  vulnerable.  Thus,  some  Human  Factors  graduates  (of 
programs  such  as  Purdue's  special  HFE  masters  under  McCormick)  were  assigned  to 
the  WAPS  test  development  program,  while  one  graduate  student  of  Dunnette  at 
Minnesota  with  a  major  in  motivation  theory  ended  up  at  the  Human  Factors  Branch 
at  the  Flight  Test  Center  at  Edwards  AFB  CA  as  a  human  factors  engineer. 

Such  assignment  problems  and  a  growing  concern  over  the  future  directions 
of  Human  Factors  Engineering  in  the  USAF  led  to  the  creation  within  the  Air  Force 
Systems  Command  (AFSC)  of  a  Human  Factors  Committee  to  examine  these  problems 
and  recommend  solutions.  One  suggestion  surfacing  through  this  committee  was 
to  transfer  all  Human  Factors  psychologists  from  AFS  2675  to  the  Medical  Specialties 
corps  to  enhance  the  career  potential  of  HFE  officers  and  better  control 
assignments.  Other  organizations  proposed  extensive  revision  of  the  SEI  codes 
to  permit  better  identification  and  tracking  of  subspecializations. 

In  the  context  of  these  various  developments,  the  Air  Force  Manpower  and 
Personnel  Center  (AFMPC)  classification  branch  received  an  unsolicited  proposal 
to  conduct  an  occupational  analysis  of  the  behavioral  scientist  field.  This 
proposal  was  generated  by  members  of  the  USAFOMC  occupational  analysis  branch 
who  suggested  that  it  be  conducted  on  a  part-time  basis,  since  the  total 
utilization  field  population  did  not  justify  a  normal  priority  occupational 
survey  project.  AFMPC  approved  the  request  for  the  project,  which  had  the 
support  of  the  2675  Career  Development  Manager,  and  an  AFPT  control  number 
was  issued  to  authorize  the  survey. 


INVENTORY  DEVELOPMENT 


As  a  starting  point  In  developing  a  task  list  for  the  267X  field.  Air 
Force  personnel  documents,  such  as  APR  36-1,  Officer  Specialty  Descriptions, 
were  screened  to  identify  basic  duties  and  responsibilities  of  USAF  behavioral 
scientists.  In  addition,  a  set  of  special  job  descriptions  for  267X  officers 
were  recovered  from  AFHRL  files.  In  1974-75,  AFHRL  had  conducted  a  special 
study  of  all  officer  AFSCs  by  collecting  narrative  job  descriptions  from  a 
representative  sample  of  positions.  Twenty-two  behavioral  scientist  position 
descriptions  were  located  in  this  file  and  served  as  a  foundation  for  preliminary 
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FIGURE  1.  EVALUATION  OF  THE  AIR  FORCE  BEHAVIORAL  SCIENTIST  UTILIZATION  FIELD 


AFS  267X  USAF  JOB  INVENTOR^ DUTY  AREAS 


DUTY 

A  GENERAL  COMMAND  FUNCTIONS 
B  SUPERVISORY  FUNCTIONS 
C  ADMINISTRATIVE  FUNCTIONS 
D  GENERAL  FUNCTIONS 
E  PROFESSIONAL  DEVELOPMENT  FUNCTIONS 

F  CONSULTANT  FUNCTIONS 

G  LIASIQN  FUNCTIONS 

H  CONTRACT  MONITORING  FUNCTIONS 

I  COUNSELING  FUNCTIONS 

J  RESEARCH  FUNCTIONS 

K  APPLICATIONS  OF  RESEARCH 

L  MANAGING  RESEARCH  OR  APPLICATIONS  PROGRAMS 
M  ORGANIZATIONAL  IMPROVEMENT  FUNCTIONS 

N  ACADEMIC  INSTRUCTOR  FUNCTIONS 

0  HUMAN  FACTOR  ENGINEERING  (HFE)  FUNCTIONS 

P  PROMOTION  TEST  CONSTRUCTION  FUNCTIONS 

Q  OCCUPATIONAL  ANALYSIS  FUNCTIONS 
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task  list  development .  Similar  blank  forms  were  reproduced  by  USAFOMC  and 
mailed  to  about  30  behavioral  scientists  to  update  position  descriptions  and 
capture  recent lv  developed  jobs.  As  the  opportunity'  presented  itself  during  TDY 
visits  for  other  purposes  (in  Che  normal  occupational  analysis  program),  the 
authors  interviewed  over  30  behavioral  scientists  at  Wr ight-Patterson  AFB, 

USA!’  Academy,  Lowry  AFB,  Keesler  AFB,  Norton  AFB,  Gunter  AFS,  and  Randolph  AFB. 
Over  one-third  of  the  members  of  the  specialty  were  contacted  either  by  mail 
or  through  personal  interviews. 

The  wide  geographical  dispersion  of  jobs  and  the  variety  of  one  and  two- 
deep  positions  became  very  obvious  during  this  phase  of  the  project,  as  did 
the  dynamic  nature  of  the  utilization  field.  New  jobs  were  being  created  and 
old  jobs  deleted  quite  frequently  during  this  period.  A  new  Lt  Colonel's 
position  was  created  at  Gunter  AFS  to  support  human  factors  of  computer  systems 
operations.  A  new  2675  position  was  created  in  Air  Force  Logistics  Command 
headquarters  (Wr ight-Patterson  AFB)  to  evaluate  AFLC  job  enrichment  programs. 
Three  or  four  positions  for  Captain  human  factors  psychologists  were  switched 
to  civilian  positions  at  Edwards  AFB  CA  when  no  qualfied  officers  (with  HFE 
degrees  or  experience)  could  be  identified  for  assignment.  The  last  experimental 
psychologist  position  (Armed  Forces  Radiological  Research  Institute,  Bethesda 
MD)  was  deleted  upon  the  incumbent's  reassignment.  He  was  replaced  by  a 
Veterinarian  since  no  PhD-level  experimental  psychologist  was  available  These 
kinds  of  shifts  in  jobs  made  the  development  of  a  comprehensive  occupational 
survey  instrument  difficult. 

Once  thi-s  phase  of  the  inventory  development  process  was  completed,  a 
relatively  short  task  list  containing  330  tasks  grouped  under  17  major  duty 
headings  was  tentatively  developed.  Because  of  the  relatively  small  population 
(about  140  officer  positions)  and  diverse  number  of  jobs,  the  task  list  was 
written  at  a  more  general  level  of  specificity  Lhan  is  normally  the  case.  With 
small  fields  such  as  this,  only  a  few  tasks  per  known  job  group  will  adequately 
differentiate  clusters  and  job  types.  Thus,  a  fairly  long  and  detailed  t33k 
listing  was  deemed  unnecessary. 

The  Job  Inventory  was  organized  functionally,  with  general  duties  listed 
first  and  more  specialized  duties  following.  Duty  A,  General  Command  Functions, 
contained  17  tasks  (conduct  Commander’s  Call,  etc.)  which  might  be  performed 
by  organizational  commanders.  Duty  B  listed  42  tasks  which  most  supervisors 
would  perform  (Write  Officer  Efficienty  Reports,  etc.).  Duties  C  and  D  outlined 
Administrative  and  General  Functions  which  might  be  performed  by  most  officers, 
and  Duty  E  detailed  Professional  Development  Functions  both  for  officers  in 
general  and  for  psychologists  in  particular.  The  remaining  duties  (F  through  Q) 
related  to  specific  tasks  performed  bv  members  of  some  known  behavioral  science 
job  groups  (counselors,  researchers,  technology  applications,  HFE,  test 
development,  occupational  analysis,  etc.).  These  duty  headings  and  the  number 
of  tasks  for  each  are  displayed  in  Figure  2. 

A  fairly  extensive  background  section  was  also  included  in  the  Job  Inventory 
ranging  from  personal  identification,  education  level,  academic  specialization, 
etc.  to  standard  questions  concerning  job  ’’nterest  and  satisfaction  (See  Figure 
3).  These  types  of  data  facilitate  the  identification  of  job  groups  during 
analysis  and  permit  a  more  detailed  look  at  potential  problem  areas  within 
the  utilization  field.  Such  data  may  oe  displayed  by  erade,  iob  type,  or 
organization  to  highlight  differences  in  groups  or  to  identify  particular 
jobs  or  areas  where  morale  may  be  an  issue.  Finally,  the  Job  Inventory  also 
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include  1  a  special  section  on  earner  assignments  in  an  experimental  attempt 
to  determine  possible  career  progression  patterns. 

Hie  final  job  inventory  was  then  validated  through  comprehensive  reviews  by 
senior  behavioral  scientists  at  AFMPC,  AFHRL,  and  USAFOMC.  In  addition,  the 
AFMPC  Cartel  Development  Manager  also  reviewed  the  inventory. 


FIELD  ADM  In’  7.  STK  AT  I  ON 


The  printed  inventory  was  administered  by  direct  mailing  to  organizations 
which  utilize  267X  officers,  and  to  those  individuals  who  were  in  remote  locations 
or  one  deep  positions.  Current  membership  of  the  utilization  field  was 
determined  bv  a  roster  furnished  by  the  AFMPC  Career  Development  Manager.  The 
inventory  was  mailed  to  the  field  in  May  and  June  198],  with  the  request  for 
return  NI.T  mid  .July.  In  July,  follow  up  telephone  calls  were  made  with  units 
and  individuals  who  had  riot  yet  responded.  Survey  administration  was  closed 
in  Angus t , 

Inventory  booklets  were  received  from  a  substantial  cross  section  of  the 
agencies  which  use  USAF  behavioral  scientists  (see  Figure  4).  Air  Training 
Command  has  the  most  behavioral  scientists  assigned  in  a  variety  of  diverse 
jobs;  the  technical  training  positions  include  Technology  Applications  officers 
at  the  five  ATC  Technical  Training  Centers.  Air  Force  Systems  Command  is  the 
second  largest  user  including  many  one  or  two-ueep  Human  Factors  and  research 
positions.  Those  include  the  USAF  School  of  Aerospace  Medicine,  Electronic 
Systems  Division,  Flight  Test  Center,  Aeronautical  Systems  Division,  and 
others.  A  few  positions  exist  with  DOD  and  the  operational  commands.  In  most 
cases,  incumbents  held  either  the  2671,  entry  level  AFSC  or  2675  (fully 
qualified).  To  insure  a  complete  picture,  a  small  sample  of  members  of  other 
specialties  were  also  asked  to  participate.  These  included  a  small  number  of 
Colonels  (who  cannot  hold  the  2675)  who  progressed  from  behavioral  science 
assignments.  Where  members  of  other  specialties  worked  side  by  side  with 
behavioral  scientists,  they  were  also  surveyed.  This  included  Education  and 
Training  Officers,  Instructors,  and  Staff  Scientists.  In  one  case,  a  Senior 
Master  Sergeant  was  also  included,  since  he  performed  the  same  job  as  some 
2675  officers. 

Oenerallv,  the  sample  is  very  representative  of  the  utilization  field, 
with  about  80  percent  of  all  26/X  officers  included  in  the  study  (see  Figure 
5).  A  few  jobs  .it  some  locations  are  nor  included  for  a  variety  of  reasons. 
Several  individuals  who  were  retiring  or  leaving  the  USAF  declined  to 
participate.  In  one  case  at  AFHRL  at  Williams  AFB,  all  three  job  Incumbents 
were  either  leaving  service  or  had  already  departed  for  reassignment,  and 
replacement  officers  were  not  yet  in  place.  While  exclusion  of  these  members 
could  lead  to  some  sample  bias,  the  amount  of  bias  is  considered  to  be 
minimal  since  80  percent  of  the  field  is  captured  in  the  final  sample. 

PRELIMINARY  KESl’i  TS 


Since  analysis  of  the  survey  data  has  just  begun  this  month,  it  is 
net  possible  to  present  detailed  findings  in  -his  paper.  However,  since  there 
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are  several  high  interest  items,  such  as  job  satisfaction  and  career  plans, 
which  can  be  easily  addressed  at  this  time,  some  preliminary  results  are 
presented  for  these  areas.  In  addition,  a  brief  overview  of  the  job  structure 
will  be  discussed. 

Figure  6  displays  the  perceptions  of  behavioral  scientists  in  various 
grades  of  the  organizational  climate  of  their  unit.  As  you  will  note, 
there  appear  to  be  some  substantial  differences  among  the  grade  groups,  with 
Lts  being  generally  less  satisfied  than  other  groups.  This  is  perhaps  not  a 
startling  conclusion;  however,  we  also  see  basically  the  same  pattern  when 
we  examine  responses  to  the  question  on  job  interest  (see  Fig.  7).  Again, 
the  extreme  responses  seem  to  be  in  the  Lts  and  Colonel  groups.  A  slightly 
different  pattern  was  noted  for  the  question  as  to  how  the  job  utilizes  ones 
talents  (see  Fig.  8).  Here,  Majors  seem  to  be  most  satisfied,  with  the  Lts 
again  having  a  greater  proportion  of  dissatisfied  individuals. 

When  we  asked  what  sense  of  accomplishment  behavioral  scientists  received 
from  their  jobs,  the  pattern  was  similar  to  earlier  questions  (see  Fig.  9). 

Again,  this  is  hardly  a  surprising  finding,  since  the  Lts  group  contains  some 
individuals  who  may  not  choose  a  full  military  career.  We  hope  to  be  able  to 
develop  a  comparative  data  base  from  another  officer  scudy,  the  Professional 
Military  Education  project  (Barucky  1980),  to  compare  behavioral  scientists 
with  the  Air  Force  population  at  large.  Those  results  will  be  presented  at 
a  later  time. 

Figure  10  displays  career  plans  by  major  organizational  unit.  As  you 
can  see  from  this  data,  a  majority  of  the  individuals  in  Air  University 
and  the  USAF  Academy  plan  to  remain  in  behavioral  science  careers,  while 
a  substantially  lower  percentage  of  officers  assigned  to  Air  Training  Command 
have  similar  plans.  This  may  be  a  function  of  the  larger  population  in  ATC 
(N=67),  with  a  higher  proportion  of  very  junior  officers.  This  is  the  kind 
of  question  we  will  be  examining  in  detail  as  we  get  further  Into  the  analysis 
of  these  data. 

One  of  the  major  objectives  of  this  study  was  to  identify  the  major  types 
of  jobs  which  exist  in  the  utilization  field.  We  were  able  to  use  a  new 
experimental  CODAP  routine  called  CORSET,  which  Mr.  Bill  Phalen  and  Mr.  Johnny 
Weissmuller  are  reporting  in  another  session  of  this  conference  (Phalen  & 

Weismuller  1981).  This  program  permits  a  very  quick  analysis  of  the  tasks  which 
discriminate  the  various  groups  formed  in  the  hierarachical  clustering  process 
to  establish  which  are  significant  groups.  Using  this  new  technique,  we  were 
able  to  collapse  the  38  starter  groups  on  our  diagram  (W=2)  into  25  job  types 
which  clustered  into  seven  major  clusters.  These  include:  Research  Programs 
Officers  (35%  of  the  sample);  Functional  Unit  Supervisors  (10%);  Academic 
Instructors-Counselors  (10%);  Junior  Task  Scientists  and  Students  (9%);  Occupational 
Analysts  (15%);  Human  Factors  Engineering  Researchers  (7%);  and  Test  Development 
Psychologists  (12%).  These  major  types  of  jobs  Include  98%  of  the  sample.  The 
remaining  individuals  were  filling  a  number  of  one-deep,  unique  positions. 

Each  of  the  major  clusters  is  composed  of  more  discrete  job  types,  which 
represent  variations  of  tasks.  For  example,  the  Research  Program  Officers  cluster 
includes  the  following  groups:  Personnel  Research  Program  Managers,  Technology 
Application  Researchers,  Technology  Applications  Staff  Officers,  Plans  Staff 
Officers,  Senior  Academic  Staff  Officers,  Contract  Monitors,  Test  Development 
Research,  MPC  Attitude  Researchers,  and  Air  War  College  Evaluation  research.  As 
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might  be  surmised  from  the  various  titles,  each  group  appears  to  specialize  in 
research  activities  involving  a  different  organizational  mission  or  program  and 
perform  a  slightly  different  set  of  specific  tasks. 

The  Academic  Instructor-Counselor  cluster  is  composed  of  the  following 
groups:  USAF  Academy  Instructors;  USAFA  Instructor-Counselors;  AFROTC 
Instructor-Counselors;  and  Other  Instructors.  As  these  names  imply,  some 
USAFA  faculty  members  perform  primarily  as  classroom  teachers  (another  group 
identified  in  the  first  major  cluster  perform  both  classroom  instruction  and 
conduct  research  projects).  A  separate  group  both  instructs  and  serves  as 
counselors.  This  group  includes  individuals  assigned  to  the  Cadet  Counseling 
Center,  which  is  now  part  of  the  Department  of  Behavioral  Science  and  Leadership. 
Their  "core"  tasks  suggest  that  they  are  being  used  both  as  counselors  and  as 
classroom  instructors.  Interestingly,  the  AFROTC  instructors  included  in  this 
study  are  more  similar  to  the  USAFA  Instructor-Counselors  than  to  the  Instructor 
group.  Their  "core"  tasks  reflect  considerable  personal  counseling  outside  of 
the  classroom.  The  Other  Instructor  group  includes  several  unique,  one-of-a-kind 
faculty  positions,  such  as  the  DOD  Human  Relations  Institute  at  Patrick  AFB, 

FL.  The  group  of  other  instructors  seem  to  group  together  by  their  shared 
instructing,  counseling,  and  course  development  tasks. 

Most  of  the  other  major  job  groups  involved  highly  specialized  functional 
programs.  In  the  Occupational  Analysts  group,  there  were  three  distinct  job 
types  which  included  Airman  Analysis  (OMYO) ,  Inventory  Development  (OMYV) ,  and 
Management  Applications  and  Officer  Analysis  (OMYA)  personnel,  thus  replicating 
the  formal  sections  of  our  branch  (some  section  chiefs  grouped  in  the  supervisor 
job  group).  Interestingly,  the  present  USAF-Royal  Australian  Air  Force  Exchange 
officer  grouped  with  OMYA  analysts,  even  though  the  incumbent  has  never  been 
assigned  to  the  USAFOMC.  This  is  very  realistic  since  we  know  that  he  both  does 
inventory  development  and  data  analysis  as  do  the  members  of  OMYA.  One  member 
of  the  Canadian  Armed  Forces  occupational  analysis  program  also  appeared  in  the 
job  group  (he  completed  the  survey  in  place  of  the  newly  assigned  USAF-Canadian 
Forces  exchange  officer  who  reported  for  duty  in  July).  These  groupings  were 
particularly  satisfying  since  they  give  some  external  validity  to  our  survey 
results. 

The  Human  Factors  Engineering  Researchers  formed  a  very  discrete  job  group 
with  very  little  overlap  with  other  groups  (although  their  supervisors  and  staff 
personnel  did  appear  in  the  first  cluster).  Members  of  this  group  represent  very 
different  organizations  including  ASD,  ESD,  the  Flight  Test  Center,  end  others. 
Their  "core"  tasks  reflect  a  concentration  of  6.3  and  6.4  development  efforts 
not  shared  by  any  other  group. 

Finally,  the  WAPS  Test  Development  Psychologists  assigned  to  the  USAFOMC 
have  very  distinct  jobs.  Two  job  types  within  this  cluster  include  test 
development  psyhcologists  and  test  review  psychologists.  Their  jobs  focus  on 
procedures  for  developing  promotion  Specialty  Knowledge  Tests  and  the  quality 
control  of  such  tests. 

It  is  not  possible  in  this  brief,  preliminary  report  to  examine  these  job 
groups  in  any  great  detail.  A  more  complete  report  of  the  occupational  analysis 
project  is  to  be  published  next  spring,  end  we  hope  to  be  able  to  give  a  more 
detailed  report  at  a  later  date  (perhaps  the  Psychology  in  the  DOD  Symposium 
at  the  USAF  Academy  in  April,  or  some  other  forum). 
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It  is  possible,  however,  to  conclude  from  the  preliminary  results  of  this 
occupational  analysis  that  the  behavioral  scientist  is  a  very  diverse  utilization 
field.  A  quick  look  at  job  perceptions  of  incumbents  suggests  that  there  may  be 
some  problems,  particularly  among  the  more  junior  officers,  and  that  these 
problems  may  vary  by  major  command  or  organization.  We  are  hoping  that  this 
occupational  analysis  may  help  solve  some  of  the  problems  through  better 
Identification  of  jobs  and  highlighting  potential  areas  requiring  action.  Before 
such  results  can  be  achieved,  a  considerable  amount  of  further  analysis  is 
required. 


References 


Barucky,  J.M. ,  Officer  Professional  Military  Education  Curriculum  Validation 
Project.  Occupational  Survey  Report,  Occupational  Analysis  Program,  USAFOMC, 
Randolph  AFB,  TX,  1980. 

Christal,  R.E.,  The  United  States  Air  Force  Occupational  Research  Project. 
AFHRL-TR-73-75,  AD-774  574.  Lackland  AFB,  TX:  Occupational  Research 
Division,  Air  Force  Human  Resources  Laboratory,  January  1974. 

Jacoby,  Jacob. ,  "The  Plight  of  the  Uniformed  Air  Force  Psychologist. " 
Professional  Psychology,  1970,  SUMMER,  Vol  1(4),  383-387. 

Mitchell,  J.L.  and  Driskill,  W.E.,  Variance  Within  Occupational  Fields: 

Jobs  Analysis  Versus  Occupational  Analysis.  Paper  presented  at  the  21st 
Annual  Conference  of  the  Military  Testing  Association,  San  Diego,  CA,  1979. 

Morsh,  J.E.,  "Job  Analysis  in  the  United  States  Air  Force."  Personnel 
Psychology,  1964,  17(1),  7-17. 

Phalen,  W.J.  and  Weissmuller,  J.J.,  COPAP:  Some  New  Techniques  to  Improve 
Job-Type  Identification  and  Definition.  JPaper  presented  at  the  23rd  Annual 
Conference  of  the  Military  Testing  Association,  Washington,  D.C.,  1981. 

# 

i 


851 


AD  P001356 


Modrtck,  Jolin  A.,  Plocher,  T.  A.  &  Hutcheson,  J.  D. ,  Honeywell,  Inc-, 
Minneapolis,  Minnesota;  Chambers,  R.  M.,  US  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences,  Alexandria,  Virginia.  (Tues.  A.M.) 


(  Performance  and  Skill  Level  Requirements  for  Fire  Support  Teams 


t  ^This  research  developed  a  Task  Data  Base  for  Fire  Support  Teams 
(FIST)  and  identified  48  tasks,  grouped  them  into  six  functional  areas, 
classified  them  as  either  procedural  or  semi- structured,  and  ranked 
I  them  according  to  criteria  of  criticality  and  performance.  The  FIST 
Task  Data  Inventory  resulted  from  the  task  analyses  and  integration  of 
'  task  descriptions  and  information  obtained  from  uuestionnaires,  Inter¬ 
views,  and  observations  administered  at  three  COilUS  and  four  USARJSUR 
■  divisions.  Statistical  analyses  of  the  data  indicated  that  (1)  criti¬ 
cality  and  performance  were  negatively  correlated,  (2)  procedural  tasks 
were  the  least  critical  and  best  performed,  (3)  semi-structured  tasks 
were  the  most  critical  and  poorest  performed,  (4/  task  difficulty  was 
the  principal  factor  in  ratings  of  criticality,  and  (5)  traditional 
tasks  were  performed  better  than  non-traditlonal  tasks.  The  results  of 
these  analyses,  and  the  utilization  of  the  FIST  Task  Data  Inventory, 
are  discussed  in  terms  of  personnel  and  training  assessments,  simu¬ 
lation  and  training  device  recommendations,  task  analyses  methodo¬ 
logies,  and  selection  criteria. 
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The  purpose  of  this  research  was  to  examine  the  needs  and  problems  in 
manpower  and  training  which  have  arisen  during  implementation  of  the  Fire 
Support  Team  (FIST).  The  work  was  performed  under  Contract  No.  MDA903-99-C- 
9669  with  the  Army  Research  Institute,  November  1,  1979,  to  March  30,  1981. 

The  FIST  is  a  new  entity  which  replaces  the  traditional  forward  observer 
by  a  team  which  is  intended  to  provide  wider  coverage,  greater  mobility,  and 
effective  integration  of  improved  munitions.  The  FIST  consists  of  forward 
observer  parties  deployed  with  platoons  to  provide  flexibility,  mobility,  and 
range,  and  a  headquarters  element  at  the  company  command  post  to  provide  inte¬ 
gration,  coordination,  planning,  and  responsibility  for  the  company's  scheme 
of  maneuver. 

FISTs  have  been  formed  by  training  new  personnel,  by  reclassification, 
and  by  assignment  to  the  field.  This  process  required  reorganization  and  new 
development  of  training  at  both  resident  school  and  unit. 

OBJECTIVES  AND  APPROACH 

There  were  three  objectives  of  study:  1)  Specify  the  performance  and 
skill  level  requirements  for  personnel  assigned  to  FISTs;  2)  Determine  the 
degree  to  which  performance  and  skill  level  requirements  are  being  met,  and 
identify  the  shortfalls  which  exist  in  manning,  organization,  equipment,  and 
training  of  FISTs  in  the  field;  and  3)  Project  the  probable  impact  that  future 
field  artillery  systems  will  have  on  the  performance  and  skill  requirements 
for  FIST. 

The  approach  consisted  of  five  tasks:  1)  Develop  a  task  data  base  for 
FIST;  2)  Collect  interview  and  questionnaire  data  from  fire  support  personnel 
in  fielded  units  in  CONUS  and  USAREUR;  3)  Analyze  interview  and  questionnaire 
data  for  problems  or  deficiencies  in  performance,  and  training  needs  and 
contributing  factors;  4)  Estimate  the  effect  of  future  weapon  systems  on  job 
task  and  training  requirements  for  FIST;  and  5)  Summarize  FIST  training  and 
performance  deficiencies,  and  make  recommendations  to  correct  them. 
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DEVELOPMENT  OF  A  TASK  DATA  BASE  FOR  FIST 


A  Task  Data  Base  was  compiled  by  integrating  several  sources;  no  single 
source  provided  an  adequate  list  of  tasks  or  adequate  task  data.  It  was 
necessary  also  to  supplement  this  information  by  interviews  with  experts  and 
by  analysis.  The  printed  sources  of  task-descriptive  information  were: 
Soldiers'  Manuals  FM6-13F  1/2  and  3;  ARTEP  6-365  FA  Battalion,  155rrm  DS; 

ARTEP  6-105  Battalion,  105nm  DS;  Training  Circular  6-20-10  FIST,  Task  Analysis 
13F,  Directorate  of  Course  Development  DCRDT;  Task  Analysis,  Close-Support 
Study  Group  II;  Training  Text  6-20-7,  FIST/FAC  Operations;  and  Weapon  5  stems 
Training  Effectiveness  Analysis  -  Forward  Observer.  The  resulting  list  of 
tasks  provided  a  comprehensive  inventory  of  job  tasks.  It  constituted  a  data 
base  which  was  used  to  structure  interviews,  performance  analyses,  and  train¬ 
ing  analyses  in  the  later  phases  of  the  study. 

The  data  on  each  task  was  summarized  on  a  task  description  worksheet 
which  contained  the  following  categories  of  information:  task  number;  duty 
(major  functional  area)  to  which  the  task  belongs;  task  description;  task 
assignment  to  positions  within  FIST,  as  indicated  or  implied  by  doctrine  and 
as  implemented  in  current  field  practice;  task  criticality  rank  and  narrative 
evaluation  of  task  criticality;  references  for  task  information;  and  listing 
of  component  subtasks,  if  applicable,  and  references  to  the  source  of  subtask 
descriptions.  These  data  were  obtained  by  analysis  of  responses  to  a  ques¬ 
tionnaire  and  interviews.  Functional  flow  diagrams  were  also  prepared  for 
nonprocedural,  semi -structural  tasks,  or  a  set  of  related  tasks,  which  were 
not  amenable  to  description  by  a  fixed  sequence  of  subtasks  or  steps. 

COLLECTION  OF  INTERVIEW  AND  QUESTIONNAIRE  DATA 

Questionnaires  were  administered  and  interviews  conducted  among  personnel 
of  FIST  units  in  the  field,  related  fire  support  organizations,  and  commanders 
of  maneuver  units.  The  purpose  of  the  questionnaire  was  to  obtain  information 
on  the  military  occupational  specialty  (MOS)  background  of  FIST  personnel, 
nature  of  a  company  commander's  experience  with  the  FIST,  and  ratings  of  tasks 
on  criticality  and  quality  of  performance  by  the  FIST.  The  purpose  of  the 
interview  was  to  obtain  less  easily  structured  information  such  as  factors  and 
considerations  influencing  a  respondent's  ratings  and  perceptions  of  the  ade¬ 
quacy  of  training. 

The  questionnaire  consisted  of  Background  and  Task  Inventory  sections. 
Different  forms  were  used  for  FIST  personnel  and  the  company  commander.  For 
FIST  participants  it  consisted  of  length  of  service,  MOS,  and  military  train¬ 
ing  history;  for  company  contnanders  it  consisted  of  the  nature  of  their  expo¬ 
sure  to  FIST  concept  prior  to  assuming  a  company  command.  The  Task  Inventory 
section  was  designed  to  obtain  evaluations  of  each  FIST  task  on  five  attri¬ 
butes:  Task  Assignment,  in  terms  of  the  FIST  member  responsible  for  the  task; 
Performance  Rating  from  combat-ready  to  totally  inadequate;  and  Tast  Critical¬ 
ity  based  on  three  estimates.  They  were:  Task  Difficulty  (not  difficult  to 
extremely  difficult);  Consequences  of  Inadequate  Performance  (catastrophic  to 
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none);  and  Detectability  of  Error  (undetectable  to  irronediately  detectable  by 
the  person  coiTmitting  the  error).  Criticality  was  not  intended  to  reflect 
importance  of  a  task  but  properties  of  tasks  that  are  significant  in  deter¬ 
mining  priorities  for  training. 

Only  FIST  Chiefs  made  evaluations  on  all  five  scales.  FIST  enlisted 
personnel  responded  to  Task  Assignment  with  a  Yes  or  No  to  indicate  whether 
they  had  performed  that  task  in  their  assigned  job  and  rated  task  difficulty; 
company  commanders  rated  only  performance  and  consequences  of  inadequate 
performance. 

Interviews  we^e  conducted  in  three  divisions  in  the  continental  U.S.  and 
three  in  Europe.  Questionnaires  were  administered  to  86  and  137  persons  in 
the  U.S.  and  Europe,  respectively;  interviews  were  conducted  with  67  and  94 
persons.  An  interview  protocol  was  prepared  for  each  category  of  respondent 
to  be  used  as  a  checklist  of  coverage  to  guide  the  interview.  A  primary  con¬ 
cern  was  to  allow  the  respondent  latitude  in  his  responses  and  comments  and 
permit  the  interviewer  freedom  to  follow  up  and  elaborate  on  ratings  and 
comments. 

The  results  are  organized  into  Task  Data,  Training,  Personnel,  and 
Organizational  Factors. 

FIST  TASK  DATA  BASE 

The  information  from  the  printed  sources  was  combined  with  task  data 
obtained  by  analysis  of  responses  to  the  questionnaires  and  interviews.  Forty 
eight  tasks  were  identifie  '  and  grouped  into  six  functional  areas  or  duties. 
The  functional  areas  and  number  of  tasks  in  each  are:  I.  Plan  fires  to 
support  maneuver  unit  operations  (9);  II.  Prepare/maintain/disseminate  fire 
support  information  (6);  III.  Manage  fire  support  communication  system  (5); 
IV.  Manage  use  of  fire  support  assets  at  maneuver  unit  level  (3);  V.  Ac¬ 
quire  targets  of  opportunity  (4);  and  VI.  Request/adjust  fires  (21).  In¬ 
formation  flow  diagrams  were  prepared  for  some  tasks  and  sets  of  tasks  to 
depict  the  interdependencies  among  the  tasks.  They  can  be  used  as  a  basis  to 
plan  and  operate  simulations  and  training  exercises  for  the  purpose  of  inte¬ 
grating  the  component  activities  into  an  effective  operation. 

A  set  of  basic  skills  too  molecular  to  be  components  of  tasks  emerged  as 
tasks  were  analyzed  in  greater  detail.  They  are  basic  skills  of  fire  support 
and  are  comnon  prerequisites  to  mastering  several  tasks  representing  entry- 
level  skills.  The  skills  were  grouped  into  the  following  six  categories: 

Basic  Forward  Observer  Procedures;  Intelligence  Reporting;  Basic  Map  Reading; 
Basic  Radio  Procedures;  Maintenance;  and  Basic  Fire  Support  Procedures. 

The  tasks  were  also  classified  as  procedural  and  semi -structured. 
Procedural  tasks  have  a  fixed  sequence  of  steps  or  operations,  initiating 
conditions  and  contingencies  are  known,  standards  of  performance  are  known  in 
tne  form  of  one  correct  response  or  outcome.  They  occur  in  established  situ¬ 
ations.  Semi -structured  tasks  have  a  sequence  of  steps  or  operations  which 


varies  depending  on  outcome  of  prior  step,  initiating  events  and  conditions 
vary  with  situations,  standards  of  performance  are  not  known  or  determined, 
and  multiplicity  o  appropriate  responses  exists.  They  occur  in  emergent 
situations.  Current  training  technology  is  oriented  predominantly  toward 
procedural  tasks  while  semi -structured  tasks  are  new  to  the  fire  support  com¬ 
munity.  They  involve  planning,  coordination ,  and  integration  of  fire  support. 

Mean  performance  and  criticality  ratings  were  computed  for  each  task  and 
the  tasks  were  put  in  rank  order.  The  ratings  of  the  three  criticality  scales 
were  combined  to  provide  a  single  composite  index  of  criticality.  The  rela¬ 
tionship  between  the  rank  order  of  performance  and  criticality  was  estimated 
by  Spearman  Rho  and  found  to  be  -0.67.  Perceived  level  of  performance  de¬ 
clines  as  the  tasks  increase  in  criticality. 

The  tasks  were  divided  into  thirds  on  each  dimension.  The  most  critical 
and  poorest  performance  (MC/PP)  third  and  the  least  critical  and  best  perform¬ 
ed  (LC/BP)  third  of  the  tasks  were  analyzed.  The  LC/BP  tasks  are  procedural 
with  one  exception.  The  MC/PP  tasks  are  semi -structured  and  procedural  tasks 
on  which  FIST  personnel  get  little  practice  or  experience  such  as  Request/ 
Direct  Close  Air  Support. 

Task  difficulty  is  the  major  factor  of  the  criticality  ratings.  The 
partial  correlation  between  performance  ratings  and  task  difficulty  is  0.74 
{p  <  0.001)  controlling  for  the  covariance  with  consequences  of  inadequate 
performance  and  detectability  of  errors.  Comparable  partial  correlations  of 
performance  ratings  with  consequences  and  detectability  of  errors  are  0.03 
and  -0.33  (p  <  0.05)  respect ively .  The  tasks  for  which  errors  are  more 
detectable  are  performed  better. 

The  correlation  between  the  criticality  dimensions  indicate  independence 
among  these  dimensions.  They  are:  task  difficulty/detectability  (-0.22); 
task  difficulty/ consequences  of  inadequate  performance  (0.30,  p  <  0.05);  and 
detectability  of  errors/consequences  of  inadequate  performance  (-0.33). 

TRAINING 

Resident  school  training  is  viewed  as  good  and  unit  training  as  poor. 
Training  courses  provided  at  the  Field  Artillery  School  in  Officer's  Basic, 
Cannon  Battery  Officers',  Basic  NCO,  and  Advanced  Individual  Training  are  good. 
However,  they  focus  largely  on  the  procedural  tasks;  less  procedural ized  tasks 
in  planning,  coordinating,  and  integrating  fire  support  are  not  adequately 
covered  or  exQrcised.  Unit  training  programs  are  not  adequate  in  FIST  par¬ 
ticipation  in  field  exercises  for  maneuver  units,  unit  training  for  FIST,  and 
reclassif icat ion  training. 

The  characterization  of  unit  training  provided  by  the  survey  is  repre¬ 
sented  in  the  following  responses.  Seventy-three  percent  of  enlisted  re¬ 
spondents  in  CONUS  and  USAREUR  reported  that  MOS  unit  training  is  not 
sufficient  to  become  proficient  at  one's  job  and  skill  level.  Over  half  the 


respondents  cited  training  shortfalls  as  a  major  source  of  performance  defi¬ 
ciency.  An  average  of  9-2  nours/week  of  scheduled  training  activity  while  in 
garrison  was  reported  by  60%  of  USAREUR  respondents;  CONUS  observations  were 
similar.  Deviations  took  the  form  of  "crash  courses"  prior  to  SQT  and  ARTEP 
times.  Seventy  percent  of  CONUS/USAREUR  reclassified  respondents  reported  no 
transition  in  unit  other  than  OJT. 

The  training  methods  reported  as  currently  used  to  train  in  support 
planning  consist  of:  the  FTX  (Field  Training  Exercise),  consisting  of  ARTEP, 
REALTRAIN,  and  GDP  Areas  '.lalks;  the  CPX  (Cormand  Post  Exercise)  consisting  of 
CAMMS,  Battle  of  Eiterfeld;  and  Dunn  Kempf;  and  classroom  lectures.  The 
typical  USAREUR  annual  FTX  training  consists  of  supporting  one  battalion 
ARTEP,  one  three-week  REALTRAIN  exercise  at  hohenfel  MTA,  and  one/ two  GDP 
terrain  walks.  Factors  limiting  the  value  of  the  FTX  in  USAREUR  are  the 
severe  constraints  on  field  maneuver,  physical  separation  of  units  up  to 
100  KM,  no  feedback  on  effect  of  fire  support  planning,  unrealistic  repre¬ 
sentation  of  time  constraints  and  availability  of  assets  and  nonrepresentation 
of  multiple  information  sources. 

The  value  of  the  CPX  for  FIST  is  compromised  by  limitations  of  the  simu¬ 
lation.  Fire  support  planning  is  infrequently  exercised.  CAMMS  is  the  most 
cormionly  used  but  the  company  level  and  FIST  are  not  exercised  in  CAMMS.  The 
Dunn  Kempf  game,  a  company  level  simulation,  provides  training  far  FIST  HQ  ind 
the  F0  team,  but  with  mixed  results.  It  has  questionable  validity  of  weapons 
effects  and  requires  excessive  time  to  learn  rules  and  to  prepare/set  up/run. 
The  Battle  of  Eiterfeld  designed  in  December  1979  to  meet  FIST  needs  has  not 
yet  reacheo  the  field  units. 

One  good  program  exists  in  8th  Infantry  DIVARTY.  It  is  a  three-phased 
program  progressing  from  classroom  instruction  through  CPXs  to  a  combined  arms 
exercise  in  live  firing.  The  phases  are:  1)  Four  concentrated  hours  of  class 
on  fundamentals  of  fire  support  planning;  2)  Three  series  of  Dunn  Kempf  gaming 
exercises  played;  and  3)  Combined  arms  live  fire  exercise  and  evaluation. 

The  principal  conclusions  concerning  unit  training  is  that  the  FIST  needs 
a  separate  exercise  capability  designed  to  meet  its  training  needs  which  can 
be  used  to  provide  frequent  training  experiences.  It  also  needs  the  re¬ 
sources  and  autonomy  to  manage  its  own  training  and  development  of  its 
personnel . 

PERSONNEL 

A  summary  of  FIST  personnel  factors  is:  FISTS  in  the  field  are  under¬ 
staffed  or  staffed  with  underqualified  personnel  in  numbers,  grade  levels, 
training,  and  experience;  there  is  a  potential  problem  of  retention;  and 
personnel  authorizations  may  not  be  adequate  in  experience  and  depth  to  sus¬ 
tain  prolonged  performance. 
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Manpower  shortfalls  exist  in  both  enlisted  and  officer  ranks.  Enlisted 
ranks  are  operating  at  65%  of  authorized  strength  ranging  from  38°  to  86%. 

The  iI7  grade  showed  the  most  marked  shortfall.  Officer  ranks  are  operating 
at  60%  o+  authorized  strength,  ranging  from  46%  to  03%.  The  Brigade  FSO 
position  is  rarely  filled  b >  an  04  grade;  the  03  grade  was  only  slightly  more 
common  in  the  Battalion  FSO  positions.  The  pool  cf  FIST  lieutenants  was  sig¬ 
nificantly  understrength.  FIST  personnel  are  routinely  forced  to  perform  in 
job  positions  intended  for  higher  grade/rank  and  more  experience.  FISTs  are 
routinely  sent  to  the  field  with  less  than  a  complete  crew  in  number,  *«OS 
qualification,  and  grade/rank. 

There  is  a  credibility  problem  for  the  FIST  in  many  units.  Many  company 
commanders  do  net  value  FIST  or  know  how  to  use  it,  and  do  not  treat  it  as 
part  of  the  comnand  team.  A  similar  gap  exists  at  the  level  of  F0  team  and 
platoon  leader.  Since  deployed  FISTs  are  only  a  partial  implementation  of  the 
doctrinal  concept  and  company  cocnoanders  routinely  see  reclassified  E5s/E6$  as 
FIST  chiefs,  limited  use  of  the  FIST  chief  as  fire  support  coordinator  is  : 
safer  course. 

Most  reclassified  FISTs  come  from  MOS  related  primarily  to  forward  ob¬ 
server  positions.  Few  reclassified  FIST  members  have  had  reclassification 
training.  Thirty-seven  percent  of  the  E1-E5  personnel  reported  lack  of 
expert’se  among  13F  NCOs  as  a  major  impediment  to  training. 

Common  sources  of  dissatisfaction  in  MOS  13F  are  that  they  routinely  work 
in  a  "shortage"  environment  and  are  required  to  perform  in  jebs  for  which  they 
have  no*  qualified.  Further,  capable,  undergrade  FIST  perso.."el  often  express 
frustration  at  being  underutilized  and  unappreciated.  The  field  artillery 
batta'icn  is  still  focused  on  guns  and  the  fire  direction  center.  Career  and 
training  needs  of  the  13F  specialists  are  often  subordinated  to  priorities  of 
firing  batteries. 

ORGANIZATIONAL  FACTORS 

There  are  needs  for  greater  self-sufficiency  for  FIST  in  managing  its  own 
training,  resources,  and  personnel,  and  to  emphasize  a  forward-looking  rather 
than  reactive  role  in  the  management  and  integration  function  of  H5T.  There 
is  doubt  that  the  present  FIST  implementation  provides  sufficient  resilience 
and  flexibility  for  operation  under  emergency  or  degraded  conditions. 

Eighty  percent  of  FIST  officers  and  enlisted  respondents  in  USAREUR  and 
90%  in  CONUS  expressed  dissatisfaction  with  current  FIST  utilization.  Per¬ 
ceived  misuse  of  the  fire  support  section  within  battalion  often  was  cited  as 
a  majer  imped ir.rert  Lo  training.  The  -fundamental  problems  faced  by  fielded 
units  are  effective  integration  into  the  battalion  organization,  lack  or  an 
MOS-exper’ -need  13F  NC0  pool,  and  lack  of  time  due  to  excessive  support  de¬ 
tails.  tighty  percent  of  FIST  reported  this  misuse  of  the  FIST  as  a  serious 
factor.  Three  alternatives  proposed  to  organizing  the  FIST  were:  attach  all 


FIS:'  personnel  to  the  firing  battery;  attach  some  or  all  FIST  lieutenants  to 
the  firing  batteries  with  FIST  enlisted  personnel  remaining  in  the  HQ  battery; 
and  consolidate  all  FIST  officers  and  enlisted  personnel  in  HQ  battery  under 
the  Brigade  FSO. 

IMPACT  OF  FUTURE  SYSTEMS  ON  THE  FIST  SUPPORT  TEAM 

A  summary  of  the  implications  of  future  systems  for  FIST  training  must 
emphasize  the  following  points.  There  will  be  an  increased  complexity  of 
HQ  tasks  in  fire  support  planning,  management  of  fire  support  resources,  and 
support  to  scheme  of  maneuver.  They  are  the  semi-structured  tasks  which  are 
the  less  well  trained  tasks  at  the  present  time.  They  are  the  more  difficult 
to  train  and  require  "hands-on"  exercises  as  a  necessary  method.  Training 
will  require  greater  use  of  simulation,  modeling,  and  »;ar  gaming  in  training 
devices  and  programs.  These  methods  are  costly  and  not  mature  technologi¬ 
cally.  Future  systems  will  have  a  lesser  effect  on  the  functions  of  the 
forward  observer. 

Recommendations  for  research  and  development,  were  made  in  three  areas; 
Heed  for  more  knowledge  and  better  understanding  of  FIST  as  a  team,  its  per¬ 
formance,  and  the  workload  imposed  by  various  combat  scenarios;  need  for 
improved  training  materials  and  delivery  systems;  and  improved  retention  in 
the  13F  MOS. 


